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Introduction 


Genome-wide  association  studies  (GWAS)  of  breast  cancer  have  been  completed  among 
populations  of  European  ancestry,  and  several  regions  have  been  identified  that  appear  to 
contribute  susceptibility  to  this  cancer.  Recent  data  suggests  that  not  all  risk  alleles  for  common 
cancers  will  be  revealed  however  by  studies  limited  to  Whites  of  European  ancestry,  and  that 
similar  efforts  in  other  racial  and  ethnic  populations  will  be  needed  to  identify  the  full  spectrum  of 
common  risk  alleles  that  contribute  to  disease  risk  in  the  population.  To  identify  genetic  risk 
alleles  for  breast  cancer  risk  among  African  American  women  we  have  performed  a  well- 
powered  whole-genome  association  scan.  For  this  project  we  have  established  a  collaborative 
network  of  investigators  whose  careers  have  been  dedicated  to  studying  breast  cancer  in 
minority  populations  who  have  contributed  samples  and  covariates  from  each  of  their  respective 
studies  to  identify  genetic  variants  that  contribute  to  risk  of  breast  cancer  in  this  minority 
population.  We  have  completed  a  GWAS  of  >1.1  SNPs  in  >3000  African  American  breast 
cancer  cases  and  >2,700  controls.  With  these  data  we  have  validated  and  improved  upon 
markers  of  risk  at  the  known  breast  cancer  risk  regions  that  better  characterize  their  contribution 
to  breast  cancer  risk  in  women  of  African  ancestry.  In  collaboration  with  GWAS  in  populations  of 
European  ancestry  we  have  also  revealed  novel  risk  loci  for  breast  cancer  including  regions  that 
contribute  to  risk  for  estrogen  receptor  (ER)-negative  breast  cancer. 

Body 

The  Specific  Aim  of  this  application  was  to  identify  genetic  risk  alleles  for  breast  cancer  among 
African  American  women  by  performing  a  well-powered  whole-genome  association  scan.  Here 
we  describe  the  major  research  accomplishments  associated  with  each  task  outlined  in  the 
approved  Statement  of  Work  as  well  as  additional  novel  findings  and  scientific  contributions  that 
have  emanated  from  this  work. 

Task  1:  To  genotype  1,000,000  single  nucleotide  polymorphisms  (SNPs)  using  the  lllumina 
Infinium  1M  technology  in  1,000  invasive  African  American  breast  cancer  cases  and  1,000 
African  American  controls. 

With  the  costs  of  genotyping  decreasing  we  were  able  to  genotype  >3,000  cases  and  >2,800 
controls.  These  samples  were  selected  from  the  studies  participating  in  this  effort  (Table  1). 


Table  1.  African  American  Breast  Cancer  Studies. 


Study 

Full  Name 

Cases 

Controls 

MEC 

Multiethnic  Cohort 

734 

1027 

BWHS 

Black  Women’s  Health  Study 

825 

1170 

CARE 

The  Los  Angeles  component  of  The  Women's 
Contraceptive  and  Reproductive  Experiences  Study 

380 

224 

CBCS 

The  Carolina  Breast  Cancer  Study 

656 

608 

NBHS/ 

sees 

The  Nashville  Breast  Health  Study/  Southern 
Community  Cohort 

1242 

1002 

WAABCO 

Pennsylvania,  Nigeria,  Barbados,  Baltimore,  Chicago 

1281 

1148 

PLCO 

Prostate,  Lung,  Colorectal  and  Ovarian  Cancer 
Screening  Trial 

64 

133 

SFBCS/ 

NC-BCFR 

The  San  Francisco  Bay  Area  Breast  Cancer  Study  / 
Northern  California  Breast  Cancer  Family  Registry 

612 

284 

WCHS 

The  Women’s  Circle  of  Health  Study 

272 

240 

WFBC 

Wake  Forest  University  Breast  Cancer  Study 

125 

153 

WHI 

The  Women's  Health  Initiative 

316 

316 

WISE 

The  Women's  Insights  and  Shared  Experiences 
Study 

145 

367 

TOTAL 

6661 

6622 

Specific  details  about  genotyping,  quality  control,  statistical  analysis  and  results  of  the  GWAS 
are  described  in  detail  below  as  well  as  in  our  recent  publication  1  (Chen  et  al,  Human  Genetics, 
2012,  see  Appendix).  Tables  and  Figures  that  support  this  work  are  provided  in  the  attached 
manuscript. 

Genotyping  in  stage  1  was  conducted  using  the  lllumina  HumanIM-Duo  BeadChip.  Of  the  5,984 
samples  from  these  studies  (3,153  cases  and  2,831  controls),  we  attempted  genotyping  of 
5,932,  removing  samples  (n  =  52)  with  DNA  concentrations  <20  ng/ul.  Following  genotyping,  we 
removed  samples  based  on  the  following  exclusion  criteria:  1)  unexpected  replicates  (>98.9% 
genetically  identical)  that  we  were  able  to  confirm  through  discussions  with  study  investigators 
(only  one  of  each  replicate  was  removed,  n  =  15);  2)  unknown  replicates  that  we  were  not  able 
to  confirm  (pair  or  triplicate  removed,  n  =  14);  3)  samples  with  call  rates  <95%  after  a  second 
genotyping  attempt  (n  =  100);  4)  samples  with  <  5%  African  ancestry  (n  =  36)  (discussed 
below);  and  5)  samples  with  <15%  mean  heterozygosity  of  SNPs  on  the  X  chromosome  and/or 
similar  mean  allele  intensities  of  SNPs  on  the  X  and  Y  chromosomes  (n  =  6)  as  these  are  likely 
to  be  males. 

We  removed  SNPs  with  <95%  call  rate  (n  =  21,732)  or  minor  allele  frequencies  (MAFs) 
<1%  (n  =  80,193).  To  assess  genotyping  reproducibility  we  included  138  known  replicate 
samples;  the  average  concordance  rate  was  99.95%  (>99.93%  for  all  pairs).  We  also  eliminated 
SNPs  with  genotyping  concordance  rates  <98%  based  on  the  replicates  (n  =  11,701).  The  final 
analysis  dataset  included  1,043,036  SNPs  genotyped  on  3,016  cases  and  2,745  controls,  with 
an  average  SNP  call  rate  of  99.7%  and  average  sample  call  rate  of  99.8%.  Hardy-Weinberg 
equilibrium  (HWE)  was  not  used  as  a  criterion  for  removing  SNPs;  none  of  the  SNPs  selected 
for  replication  deviated  from  HWE  in  controls  in  each  study  (based  on  a  cut-off  of  p<0.001). 

In  stage  1,  we  utilized  STRUCTURE  2  to  infer  percent  African  ancestry  on  an  individual 
level.  A  total  of  2,546  ancestry-informative  SNPs  from  the  lllumina  array  were  selected  based  on 
low  inter-marker  correlation  and  ability  to  differentiate  between  samples  of  African  and 
European  descent.  In  evaluating  the  distribution  of  the  fraction  of  African  ancestry  across  the 
stage  1  populations,  statistically  significant  differences  (ANOVA  p<10'16)  were  noted.  We  also 
applied  principal  components  analysis  (PCA)  3  to  estimate  axes  of  variation  among  the  5,761 
individuals  using  the  same  2,546  ancestry  informative  markers.  The  first  eigenvector  accounted 
for  10.1%  of  the  variation  between  subjects,  and  subsequent  eigenvectors  accounted  for  no 
more  than  0.5%.  Using  input  genotypes  from  the  HapMap  populations,  CEU  (CEPH  Utah),  YRI 
(Yoruba),  and  JPT  (Japanese),  we  determined  that  the  first  eigenvector  captures  clearly 
differentiates  Europeans  (CEU)  and  West  Africans  (YRI)  in  the  HapMap  samples. 

In  Stage  1,  we  observed  no  evidence  of  inflation  of  the  test  statistic  (A  =  1.01)  for  the 
1,043,036  genotyped  and  2,067,098  imputed  SNPs  analyzed  in  stage  1,  and  no  excess  of  very 
small  p-values  beyond  what  was  expected.  We  observed  no  SNP  to  be  associated  with  disease 
status  at  a  genome-wide  level  of  significance  (p<5x10'8)  in  stage  1.  The  most  statistically 
significant  association  was  noted  with  SNP  rs7610073  located  in  intron  2  of  the  gene  GRM7 
(metabotropic  glutamate  receptor  7)  on  chromosome  3p26  (risk  allele  frequency  0.64;  OR  per 
allele  =  1.22;  p  =  7.4^1 0"7).  A  second  signal  was  also  noted  -486  kb  upstream  of  GRM7 
(rs10510333:  risk  allele  frequency  =  0.18;  OR  per  allele  =  1.24;  p  =  8.2x10"6).  The  associations 
with  these  2  markers  were  independent  and  remained  statistically  significant  when  both  were 
included  in  the  same  model  (p-values  of  8.3x10"7  and  9.3x10'6,  respectively). 

Task  2:  We  will  perform  follow-up  genotyping  of  a  minimum  of  13,800  SNPs  using  an  lllumina 
Infinium  iSELECT  custom  SNP  array  in  2,000  African  American  breast  cancer  cases  and  2,000 
African  American  controls.  The  actual  number  of  SNPs  to  be  examined  in  stage  2  will  depend 
on  the  per  chip/sample  genotyping  cost  when  stage  2  genotyping  will  be  conducted.  Fewer 


SNPs  were  genotyped  in  Stage  2  because  a  substantially  larger  number  of  samples  were 
genotyped  in  Stage  1. 


In  Stage  2,  we  genotyped  66  SNPs  with  association  p-values  less  than  2><1  O'4  (from  Stage  1)  for 
replication  testing  in  the  stage  2  studies  (>3,000  cases  and  >3,000  controls).  None  of  these 
SNPs  replicated  with  stage  2-wide  significance  of  <0.0008  (0.05/66),  but  2  replicated  with  a  p- 
value  <0.05  and  an  OR  in  the  same  direction  as  that  observed  in  stage  1.  Combining  results 
from  stages  1  and  2,  no  SNP  achieved  genome-wide  significance.  The  smallest  combined  p- 
values  were  noted  for  the  two  SNPs  that  replicated  in  stage  2:  rs4322600  located  -100  kb 
upstream  of  the  gene  GALC  (galactosylceramidase)  on  chromosome  14q31  (risk  allele 
frequency  =  0.78,  OR  per  allele  =  1.18,  p  =  4.3x10'6)  and  rs10510333  located  -486  kb  upstream 
of  GRM7  on  chromosome  3p26  (risk  allele  frequency  =  0.18,  OR  per  allele  =  1.15,  p  =  1.5x1  O'5). 
We  found  no  strong  statistical  evidence  that  the  associations  with  these  two  loci  differ  by  ER 
status  (p-values  for  heterogeneity  in  case-only  testing:  rs10510333:  p=0.67;  rs4322600: 
p=0.85). 


Task  3:  Case-only  analyses  will  be  performed  using  the  combined  data  from  stages  1  +  2  to 
assess  potential  heterogeneity  of  allelic  effects  by  disease  phenotype  (e.g.  ER-  and/or 
aggressive  tumors)  using  a  model  for  exposure  as  a  function  of  genotype  only  for  the  data  from 
the  cases. 

With  only  1,000  ER-negative  cases  included  in  Stage  1,  in  years  3  and  4  we  reached  out  to 
other  ongoing  GWAS  of  ER-negative  disease  in  other  populations.  These  efforts  to  find  loci  for 
ER-negative  disease  are  described  below  and  in  a  number  of  manuscripts  (Haiman  et  al,  Nature 
Genetics,  2012  and  Siddiq  et  al,  Human  Molecular  Genetics,  in  press;  see  Appendix). 4 

Chromosome  5p15 

To  search  for  genetic  risk  factors  for  ER-negative  breast  cancer  phenotypes,  we  initially 
combined  results  the  African  American  GWAS  [AABC:  3,016  cases  (1,004  with  ER-negative 
disease)  and  2,745  controls]  with  results  from  a  GWAS  of  triple  negative  breast  cancer  in 
women  of  European  ancestry  (TNBCC:  1,562  cases  and  3,399  controls)  (Haiman  et  al,  Nature 
Genetics,  2012,  see  Appendix).  This  work  took  place  in  years  3  and  4  of  the  project  period. 
In  TNBCC,  cases  were  genotyped  with  the  lllumina  660W  array.  Genotypes  of  TNBCC  cases 
were  compared  with  GWAS  data  for  publicly  available  controls.  Both  studies  imputed 
genotypes  for  common  SNPs  in  Phase  2  HapMap  populations  (release  21)  and  a  total  of 
3,154,485  SNPs,  genotyped  and  imputed  were  analyzed  in  stage  1  of  the  meta-analysis. 

We  observed  little  evidence  of  inflation  in  the  test  statistics  in  AABC  (A=1 .01 ),  TNBCC 
(A=1.04)  or  in  the  meta-analysis  of  the  two  GWAS  (A=1.02).  In  the  combined  results,  only  SNP 
rs10069690  (NCBI36/hg18,  chr5:1 ,332,790)  located  in  intron  4  of  the  TERT  gene  at 
chromosome  5p15  displayed  a  genome-wide  significant  association  with  ER  negative  breast 
cancer  (AABC:  OR  per  allele=1.32,  p=1.3x10"6;  TNBCC:  OR=1.25,  p=1.2x10'3;  combined  OR 
=1.29,  p=1.0x10"8).  To  further  confirm  the  association  at  5p15,  we  genotyped  SNP  rs10069690 
in  women  of  European  ancestry,  which  included  8,365  cases  (1,359  ER  negatives)  and  10,935 
controls  from  the  NCI  Breast  and  Prostate  Cancer  Cohort  Consortium  (BPC3)  and  6,182  cases 
(933  ER  negatives)  and  5,966  controls  from  Studies  of  Epidemiology  and  Risk  Factors  in 
Cancer  Heredity  (SEARCH).  Evidence  for  replication  was  observed  for  rsl 0069690  and  ER 
negative  breast  cancer  in  both  studies  (BPC3:  OR=1.09,  p=0.072;  SEARCH:  OR=1.21, 
p=6.9x10'4). 

In  an  analysis  of  ER  positive  cases,  rsl  0069690  was  only  weakly  associated  with  risk  in 
African  Americans  (AABC:  1,558  ER  positive  cases  and  2,743  controls  with  genotype  data: 
OR=1.08;  p=0.10)  and  in  women  of  European  ancestry  (BPC3:  4,890  ER  positive  cases  and 


10,397  controls,  OR=1.04,  p=0.19;  SEARCH:  3,534  ER  positive  cases  and  5,966  controls, 
OR=1.03,  p=0.37;  combined  for  all  populations:  OR=1.04,  p=0.03,  pHet  =  0.69).  The  statistical 
power  to  detect  an  OR  of  1 .19  (observed  for  ER  negative  disease)  for  ER+  positive  disease  was 
>99%  in  the  combined  sample  (9,982  cases  and  19,106  controls)  assuming  the  risk  allele 
frequency  of  0.26  in  Europeans.  This  result  suggests  that  the  association  with  breast  cancer 
might  be  specific  for  ER  negative  subtypes  (P-value  for  case-only  test  of  ER  negative  versus  ER 
positive  =  1 .0x1 0"4). 

We  further  stratified  the  cases  by  HER2  status  to  assess  whether  this  region  may  be  a 
risk  locus  for  triple  negative  disease.  In  AABC,  BPC3  and  SEARCH  the  association  with 
rsl 0069690  was  greater  for  ER/PR/HER2  negative  tumors  than  for  ER/PR  negative/HER2 
positive  tumors,  and  in  combining  all  studies,  including  TNBCC,  the  association  with 
rsl 0069690  was  significantly  greater  for  triple  negative  disease  [3,706  ER/PR/HER2  negative 
cases  and  19,728  controls  with  genotype  data,  OR=1.25,  p=8.6x10"10;  376  ER/PR 
negative/HER2  positive  cases  and  19,106  controls,  OR=1,04,  p=0.64,  P-value  for  case-only  test 
=0.011],  The  association  with  rs10069690  was  also  observed  to  be  significantly  greater  for  ER 
negative  and  triple  negative  disease  at  younger  ages  (<50  years:  ER  negative,  OR=1.32, 
p=7.0x10'9;  triple  negative,  OR=1.47,  p=2.4x10"9;  P  for  interaction  with  age  =  0.039  and  3.8x10' 
3,  respectively).  We  found  no  significant  association  with  rsl 006960  among  ER/PR  positive 
cases  when  stratified  by  HER2  status  [513  ER/PR/HER2  positive  cases  and  18,126  controls, 
OR=1.08,  p=0.30;  2,808  ER/PR  positive/HER  negative  cases  and  18,126  controls,  OR=1.03, 
p=0.30],  which  suggests  the  association  may  be  limited  to  triple  negative  disease  and  not  all 
HER2  negative  tumors. 

Chromosome  20q11 

In  order  to  identify  genetic  loci  associated  with  risk  of  ER-negative  breast  cancer,  we  conducted 
a  meta-analysis  of  three  GWAS  of  ER-negative  breast  cancer,  comprising  4,754  cases  and 
31,663  controls  with  further  replication  in  an  additional  11,209  cases  (946  with  ER-negative 
disease)  and  16,057  controls  (Siddiq  et  al,  Human  Molecular  Genetics,  in  press;  see  Appendix). 
This  work  took  place  in  year  4  of  the  study  period. 

The  meta-analysis  included  GWAS  of  ER-negative  breast  cancer  (4,754  ER-negative 
cases  and  31,663  controls)  from  the  NCI  Breast  and  Prostate  Cancer  Cohort  Consortium 
(BPC3)  (2,188  ER-negative  cases  and  25,519  controls  of  European  ancestry),  the  Triple 
Negative  Breast  Cancer  Consortium  (TNBCC)  (1 ,562  triple  negative  cases  and  3,399  controls  of 
European  ancestry)  and  the  African  American  Breast  Cancer  Consortium  (AABC)  (1,004  ER- 
negative  cases  and  2,745  controls).  We  observed  little  evidence  of  over-inflation  in  the  test 
statistics  (A  <  1.04  for  each  study;  A=1.04  for  meta-analysis).  A  total  of  86  SNPs  were 
associated  with  ER-negative  breast  cancer  at  P  <  10'5.  An  in  silico  replication  of  the  86  SNPs 
was  conducted  using  GWAS  of  European  (BCAC  combined),  Latino  (MEC-LAT,  SFBCS/NC- 
BCFR)  and  Japanese  (MEC-JPT)  ancestry  populations,  totaling  11,209  breast  cancer  cases 
(946  with  ER-negative  disease)  and  8,404  controls  (Stage  2). 

Combining  results  for  ER-negative  breast  cancer  from  stages  1  and  2,  variants  in  three 
regions  showed  genome-wide  significance  [20q1 1-rs2284378,  T  allele:  odds  ratio,  OR=1.16,  P 
=  1.1x1  O'8  (PGC  =  7.7x10"8;  Table  1);  19p13-rs81 00241,  G  allele:  OR=1.14,  P=3.5x10'8;  6q25- 
rs9383938,  T  allele:  OR=1.28,  P  =  2.37  x  1 0"1  °] .  Variants  at  6q25  have  previously  been 
associated  with  breast  cancer  risk  5,  and  variants  at  the  1 9p1 3  locus  have  been  associated  with 
ER-negative  and  triple  negative  breast  cancer  risk 6,7 .  The  rs2284378  variant  at  20q1 1  is  located 
in  a  region  containing  RALY  (RNA  binding  protein,  autoantigenic),  EIF2S2  (eukaryotic 
translation  initiation  factor  2,  subunit  2  beta)  and  ~100kb  upstream  of  ASIP  (agouti  signaling 
protein),  and  is  in  high  linkage  disequilibrium  (r2=0.96  and  D'=1)  with  rs491 1414,  which  has 
been  associated  with  melanoma  and  basal  cell  carcinoma.  8-10  The  T  allele  at  rs2284378  was 
associated  with  an  increased  ER-negative  breast  cancer  risk  (OR>1)  in  all  racial/ethnic 


populations,  except  Japanese  (OR=0.99).  However  this  group  had  the  smallest  sample  size. 
Furthermore,  no  significant  evidence  of  heterogeneity  was  observed  by  race  (P=0.28)  or  study 
(P=0.54).  When  the  study  was  extended  to  include  all  available  breast  cancer  cases  (ER- 
positive  and  ER-negative)  and  controls  from  the  participating  GWAS,  rs2284378  showed  a 
weaker  association  with  overall  breast  cancer  (OR=1.08,  P=1.3x10'6  based  on  17,868  cases 
and  43,744  controls)  and  no  evidence  for  association  with  ER-positive  disease  (OR=1.01, 
P=0.67  based  on  9,965  cases  and  22,902  controls.  A  case-only  analysis  of  ER-negative  versus 
ER-positive  breast  cancer  indicated  a  highly  significant  difference  in  ORs  by  ER  status 
(P=1.3x10"4).  Furthermore,  rs2284378  appeared  more  strongly  associated  with  triple  negative 
breast  cancer  (OR=1.16;  P=6.4x10'3),  than  ER-negative,  PR-negative,  HER2-positive  breast 
cancer  (OR=1.07,  P=0.41),  although  these  differences  were  not  statistically  significant  (case- 
only  P=0.44). 

Next,  we  examined  the  associations  between  all  candidate  loci  from  stage  1  (n=86 
SNPs)  and  overall  breast  cancer  risk  using  all  available  breast  cancer  cases  and  controls  from 
the  studies  in  stages  1  and  2.  We  identified  genome-wide  statistically  significant  associations 
with  variants  at  6q25  (rs9383938,  T  allele:  OR=1.20;  P=8.7x10'14),  and  a  recently  reported  risk 
locus  near  the  PTHLH  gene  at  12p11  11  (rs1975930,  T  allele:  OR=1.22;  P=1 ,4x10"13).  In 
addition,  we  observed  genome  wide  significant  associations  with  multiple  variants  in  a  gene- 
desert  located  at  6q14.  Allele  C  of  rs17530068  at  6q14  was  associated  with  increased  risk  for 
overall  breast  cancer  risk  (OR=1.12;  P=1.1x10'9;  PGC  =9.4x10'9)  and  both  ER-positive 
(OR=1.09;  P=1.5x10'5)  and  ER-negative  (OR=1.16,  P=2.5x10"7)  breast  cancer.  We  observed 
no  evidence  of  risk  heterogeneity  for  rsl  7530068  by  ER  status  (case-only  analysis  P=0.53); 
study  (Phet=0.16);  or  race/ethnicity  (Phet  =0.30).  Furthermore,  rsl 7530068  appeared  more 
strongly  associated  with  ER-negative,  PR-negative,  HER2-positive  breast  cancer  (OR=1.26, 
P=8.0x10  3),  than  triple  negative  breast  cancer  (OR=1.12,  P=0.07),  although  these  differences 
were  not  statistically  significant  (case-only  P=0.17). 

Fine-mapping  of  Known  Breast  Cancer  Risk  Loci.  This  study  does  not  fall  under  any  of  the 
Tasks  specifically  outlined  in  the  Statement  of  Work  however  it  is  a  logical  extension  of  our  work 
and  makes  good  use  of  the  dense  SNP  data  genome-wide  generated  in  Stage  1  of  the  scan.  A 
manuscript  describing  these  finding  is  provided  in  the  Appendix  (Haiman  et  al,  Human 
Molecular  Genetics,  2012).  12  This  work  started  in  year  3  and  was  completed  in  year  4  of 
the  study  period. 

We  tested  common  genetic  variation  at  the  breast  cancer  risk  loci  identified  in  women  of 
European  and  Asian  descent  in  the  stage  1  African  American  breast  cancer  sample  to  identify 
markers  of  risk  that  are  relevant  to  this  population.  More  specifically,  we  examined  the  index 
variants  and  conducted  fine-mapping  of  the  locus  to  both  improve  the  current  set  of  risk  markers 
in  African  Americans  as  well  as  to  identify  new  risk  variants  for  breast  cancer.  We  then  applied 
this  information  to  model  breast  cancer  risk  in  African  American  women  in  attempt  to 
characterize  the  spectrum  of  genetic  risk  in  this  population  defined  by  common  variants  at  the 
known  risk  loci. 

We  tested  the  19  validated  breast  cancer  risk  variants  (referred  as  “index  variants”)  at 

I  pi  1 ,  2q35,  3p24,  5p12,  5q11,  6q25,  8q24,  9p21,  9q31,  10p15,  10q21,  10q22,  10q26,  1 1 p15, 

I I  q  1 3,  14q24,  1 6q1 2,  17q23  and  1 9p1 3  in  models  adjusted  forage,  study,  global  ancestry  (the 
first  10  eigenvectors)  and  local  ancestry;5,12"17  17  SNPs  were  directly  genotyped,  while  2  were 
imputed  using  MACH  (r2>0.98).  All  19  variants  were  common  (>0.05)  in  African  Americans,  with 
11  variants  being  more  common  in  Europeans  than  African  Americans.  In  previous  GWAS,  the 
index  signals  had  very  modest  odds  ratios  (1 .05-1 .29  per  copy  of  the  risk  allele)  and  our  sample 
size  provided  >70%  statistical  power  to  detect  the  reported  effects  for  12  of  the  19  variants  (at 
P<0.05).  We  observed  positive  associations  with  11  of  the  19  variants  (OR  >1)  however  only  4 
were  statistically  significant  (P<0.05  at  2q35,  9q31,  10q26  and  1 9p  1 3).  Of  the  15  variants  that 


were  not  replicated  at  P<0.05,  statistical  power  was  <70%  for  only  7  of  the  variants.  Although 
power  was  more  limited,  we  also  evaluated  associations  by  estrogen  receptor  (ER)  status  as 
some  risk  variants  have  been  found  to  be  more  strongly  associated  with  ER-positive  (ER+)  or 
ER-negative  (ER-)  breast  cancer.  We  observed  positive  associations  with  12  variants  (2  at 
P<0.05)  for  ER+  disease  (n=1,520)  and  with  9  variants  for  ER-  (3  at  P<0.05;  n=988).  For  only 
one  variant  did  we  observe  statistically  significant  risk  heterogeneity  by  ER  status  (rsl  3387042 
at  2q35,  P=0.013). 

Aside  from  statistical  power,  the  lack  of  a  statistically  significant  association  with  an 
index  variant  (OR>1  and  p<0.05)  suggests  that  the  particular  variant  revealed  in  the  GWAS 
populations  may  not  be  adequately  correlated  with  the  biologically  relevant  allele  in  African 
Americans.  In  an  attempt  to  identify  a  better  genetic  marker  of  risk  in  African  Americans  we 
conducted  fine-mapping  across  all  risk  regions  using  genotyped  SNPs  on  the  lllumina  1M  array 
and  imputed  SNPs  to  Phase  2  HapMap  populations.  Through  fine-mapping  we  revealed 
markers  in  four  regions  that  were  more  significantly  associated  with  risk  than  the  index  signal 
(>1  order  of  magnitude  change  in  the  p-value)  and  are  likely  capturing  the  same  signal  (2q35, 
5q11,  10q26  and  19p13).  We  also  identified  markers  in  four  regions  that  are  not  correlated  with 
the  index  signal  in  the  GWAS  populations  (8q24,  10q22,  1 1  ql 3  and  1 6q1 2)  and  may  represent 
putative  novel  risk  variants,  with  one  being  specific  for  ER+  disease  (8q24).  These  regions  are 
discussed  below. 

Risk  variants  that  better  define  the  index  signal  in  African  Americans 

2q35 

The  index  signal  at  2q35  was  statistically  significantly  associated  with  risk  of  overall  breast 
cancer  (rsl  3387042:  OR=1.12,  P=7.5x10"3)  and  ER+  disease  (OR=1.22,  P=2.6x10"4).  However, 
we  found  stronger  associations  with  two  markers  that  are  each  modestly  correlated  with  the 
index  signal  in  CEU  and  YRI:  rsl  3000023  with  overall  breast  cancer  (OR=1. 20,  P=5.8*10'4)  and 
rs12998806:  with  ER+  disease  (OR=1.39,  P=3.3*10'6).  The  signal  in  this  region  appeared 
limited  to  ER+  breast  cancer,  which  is  consistent  with  the  initial  report  of  this  risk  locus. 15 
5q11 

We  found  a  positive  non-significant  association  with  the  index  signal  at  5q11,  which  is  located 
79  kb  centromeric  of  the  MAP3K1  gene  (rs889312:  OR=1.07,  P=0.084).  Fine-mapping  revealed 
statistically  significant  associations  with  markers,  rsl  68861 65  for  overall  breast  cancer 
(OR=1 .15,  P=6.5*10'4)  and  rs832529  for  ER-  disease  (OR=1.22,  P=1.3x10'3).  These  SNPs 
show  greater  correlation  with  the  index  signal  in  Europeans  (CEU,  r2=0.40  and  0.46)  than  in 
Africans  (YRI,  r2<0.01  and  r2=0.09),  which  suggests  that  they  may  be  better  markers  of  the 
biologically  functional  variant  in  African  Americans. 

10q26 

Both  the  index  signal,  rs2981582  (OR=1.11,  P=8.6x10'3),  and  rs2981578,  that  was  identified 
previously  through  fine-mapping  in  African  Americans  (which  some  of  these  studies  contributed 
to)18,  were  statistically  significantly  associated  with  risk  (OR=1.24,  P=1.7x10'4).  Variant 
rs2981578  was  the  most  strongly  associated  marker  in  the  region  for  overall  breast  cancer  and 
for  ER+  disease,  which  is  consistent  with  previous  reports  of  variation  in  this  region  being  more 
strongly  associated  with  ER+  breast  cancer.19  In  fine-mapping  the  locus  we  observed  a 
suggestive  association  with  a  correlated  marker  and  ER-  disease  (rs2912774:  OR=1.19, 
P=2.1xl0'3)  however  the  association  was  also  noted  with  ER+  disease  (OR=1.10,  P=0.041)  and 
is  likely  capturing  the  same  signal  as  rs2981578. 

19p13 

1 9p  1 3  was  the  first  risk  locus  reported  to  harbor  a  variant  that  may  be  specific  for  ER-  disease.20 
In  African  Americans,  the  index  variant  was  statistically  significantly  associated  with  risk  of 
overall  breast  cancer  (rs2363956:  OR=1.14,  P=8.0x10'4),  as  well  as  ER+  (OR=1.12,  P=0.016) 
and  ER-  disease  (OR=1.14,  P=0.01).  The  most  significant  association  in  the  region  for  overall 


breast  cancer  and  ER+  disease  was  with  rs3745185  (P=3.7x10"5  and  P=8.2x10'4,  respectively), 
which  is  likely  to  be  capturing  the  same  functional  variant  (r2=0.57  in  CEU  and  0.19  in  YRI).  The 
most  significant  marker  for  ER-  breast  cancer  was  correlated  with  both  rs2363956  and 
rs3745185  (rs11668840:  OR=1.25,  P=5.1x10'5). 

Novel  risk-associated  markers  at  breast  cancer  susceptibility  loci. 

8q24 

Given  the  importance  of  the  8q24  locus  in  cancer,  we  conducted  association  testing  across  the 
entire  cancer  risk  region  (126.0  Mb-130.0  Mb).21,22  The  index  signal  (rs13281615)  was  not 
statistically  significantly  associated  with  risk  in  African  Americans,  nor  did  we  identify  significant 
associations  with  correlated  SNPs.  However,  we  did  detect  a  significant  association  with 
rs16902056  and  ER+  breast  cancer  (risk  allele  frequency,  0.95;  P=6.7x10'6;  ER-:  P=0.66).  This 
SNP  is  located  78  kb  centromeric  of  the  index  variant  and  is  not  correlated  with  the  index  variant 
(r2<0.01  in  CEU  and  r2=0.027  in  YRI).  No  statistically  significant  associations  were  observed 
with  variants  found  previously  in  association  with  cancers  of  the  bladder  and  ovary,  or  leukemia 
(rs9642880:  OR=1.03,  P=0.58;  rs10088218:  OR=1.02,  P=0.62;  rs2456449:  OR=1.07,  P=0.14). 
Of  the  known  risk  variants  for  prostate  cancer  we  found  a  single  nominally  significant  (P<0.05) 
association  with  the  same  risk  allele  of  rs1016343  (P=0.015)  which  is  located  >260  kb 
centromeric  of  the  breast  cancer  risk  region  and  is  not  correlated  with  rs13281615  or 
rsl  6902056. 

10q22 

We  observed  no  association  with  the  index  signal  at  10q22  (rs704010)  which  is  located  in  intron 
1  of  the  gene  ZMIZ1 ,  or  with  any  correlated  markers.  However,  we  did  detect  strong  evidence  of 
a  second  signal  located  215  kb  telomeric  in  intron  12  of  the  gene  ZMIZ1  (rsl  2355688:  OR=1.24, 
P=6.8x10'6).  This  putative  novel  risk  variant  is  not  correlated  with  the  index  variant  in  the  CEU 
or  YRI  populations  (r2<0.01). 

11q13 

No  positive  association  was  noted  with  the  index  variant  at  1 1  ql 3.  However,  we  did  detect 
evidence  of  a  second  independent  signal  (rs609275:  OR=1.20,  P=1.0x10"5),  located  74  kb 
telomeric,  and  53  kb  centromeric  of  CCND1.  The  variant  is  monomorphic  and  uncorrelated  with 
the  index  signal  in  the  CEU  population;  and  r2  with  the  index  signal  in  the  YRI  population  is 
<0.01. 

16q12 

As  in  previous  studies  of  African  Americans  we  were  not  able  to  replicate  the  association  signal 
defined  by  the  index  variant  rs3803662. 23,24  A  recent  study  of  African  Americans  reported  a 
suggestive  association  with  SNP  rs31 04746,  which  is  located  15  kb  telomeric  of  rs3803662.25 
This  SNP  has  a  minor  allele  frequency  of  0.04  in  the  HapMap  CEU  population,  0.19  in  our 
African  American  controls,  and  is  modestly  correlated  with  rs3803662  in  Africans  (r2=0.31  in 
YRI),  but  not  in  Europeans  (r2=0.038).  Fine-mapping  around  this  putative  signal  revealed  a 
perfect  proxy  (r2=1)  for  rs31 04746,  rs31 12572,  which  is  significantly  associated  with  breast 
cancer  risk  in  African  Americans  (OR=1.18,  P=3.9x10'4)  with  the  association  noted  to  be 
stronger  for  ER+  breast  cancer  (OR=1. 27,  P=3.1xl0'5). 

For  index  SNPs  found  to  be  nominally  associated  with  breast  cancer  risk,  as  well  as  risk- 
associated  markers  identified  through  fine-mapping,  we  also  tested  for  associations  by 
genotype.  Results  from  the  genotype-specific  model  were  consistent  with  log-additive- 
associations.  Risk  variants  at  2q35  and  8q24  were  also  found  to  have  significantly  stronger 
associations  with  ER+  breast  cancer  than  ER-  disease  which  is  consistent  with  previous 
studies.19 

We  observed  no  statistically  significant  associations  with  common  variation  at  10  risk  loci 
on  1  pi  1 , 3p24,  5p12, 6q25,  9p21,  10p15,  10q21,  1 1  pi 5,  14q24  and  17q23. 


Risk  modeling 

In  this  study  we  also  estimated  the  cumulative  effect  of  all  breast  cancer  risk  variants,  and 
compared  a  summary  risk  score  comprised  of  unweighted  counts  of  all  GWAS  reported  risk 
variants  to  a  risk  score  that  included  variants  we  identified  as  being  associated  with  risk  in 
African  Americans.  Using  the  19  index  signals  from  GWAS,  the  risk  per  allele  was  1.04  (95%  Cl, 
1.02-1.06;  P=6.1><10"5)  and  individuals  in  the  top  quintile  of  the  risk  allele  distribution  were  at 
1.4-fold  greater  risk  (P=7.4x10"5)  of  breast  cancer  compared  to  those  in  the  lowest  quartile.  As 
expected,  the  risk  score  was  improved  when  utilizing  the  markers  that  we  identified  at  the 
known  risk  loci  as  being  more  relevant  to  African  Americans  (8  alleles  for  overall  breast  cancer: 
2q35,  5q1 1 ,  9q31,  10q22,  10q26,  11q13,  1 6q  1 2  and  19p13;  OR=1.18;  95%  Cl,  1.14-1.22; 
P=2.8x10'24),  with  risk  for  those  in  the  top  quartile  being  2.2-times  that  observed  in  the  lowest 
quintile  (P=3.6x1CT17).  We  observed  an  increase  of  1.9  percentage  points  in  the  area  under  the 
curve  (AUC)  (P=2.6x10'6).  This  score  was  significantly  associated  with  risk  of  both  ER+ 
(OR=1 .20,  P=1.7x10'19)  and  ER-  (OR=1.15,  P=2.8x10'9)  disease  (Phet=0.12). 

Future  Work  to  Better  Address  the  Topic:  Additional  Ongoing  Efforts  to  Reveal  Loci  for 
Breast  Cancer  in  Women  of  African  Ancestry. 

We  are  currently  conducting  additional  meta-analyses  and  follow-up  genotyping  with  new 
studies  of  breast  cancer  in  African  ancestry  populations.  In  October  of  2012,  we  will  be  meta¬ 
analyzing  GWAS  results  from  our  AABC  GWAS  with  a  GWAS  of  breast  cancer  in  Nigerian 
women  (>1,000  cases  and  >1,000  controls).  The  50,000  most  significant  associations  from  the 
meta-analysis  will  be  included  on  a  custom  iSelect  array  to  be  genotyped  by  the  AMBER  breast 
cancer  consortium  (>3,000  cases  and  >3,000  controls).  We  expect  findings  from  this  work  to 
reveal  additional  loci  for  overall  breast  cancer  and  ER-negative  disease  that  are  important  for 
women  of  African  ancestry.  The  custom  array  will  also  include  SNP  content  (-80,000  SNPs),  for 
fine-mapping  of  the  -80  known  breast  cancer  risk  loci  in  this  population. 

Key  Research  Accomplishments 

•  Established  a  consortia  to  study  breast  cancer  among  women  of  African  ancestry 

•  Conducted  the  first  genome-wide  association  study  of  breast  cancer  among  African 
American  women 

•  Ruled  out  common  genetic  variants  with  large  effects  as  contributors  to  breast  cancer 
risk  in  women  of  African  ancestry 

•  Pulled  together  all  existing  GWAS  of  ER-negative  breast  cancer  for  meta-analysis 

•  Identified  three  susceptibility  loci  for  breast  cancer  with  two  being  specific  for  ER- 
negative  breast  cancer 

•  Identified  a  locus  for  ER-negative  breast  cancer  that  contributes  to  greater  risk  of  ER- 
negative  disease  and  triple  negative  disease  in  women  of  African  ancestry 

•  Via  fine-mapping  we  improved  upon  markers  of  risk  at  known  susceptibility  loci  that 
better  characterize  their  contribution  to  breast  cancer  risk  in  women  of  African  ancestry 


Reportable  Outcomes  and  Studies  that  have  Emanated  from  the  GWAS  of  Breast  Cancer 
in  Women  of  African  Ancestry. 

Manuscripts  (provided  in  Appendix): 

A  genome-wide  association  study  of  breast  cancer  in  women  of  African  ancestry. 

Chen  F,  Chen  GK,  Stram  DO,  Millikan  RC,  Ambrosone  CB,  John  EM,  Bernstein  L,  Zheng  W, 
Palmer  JR,  Hu  JJ,  Rebbeck  TR,  Ziegler  RG,  Nyante  S,  Bandera  EV,  Ingles  SA,  Press  MF,  Ruiz- 
Narvaez  EA,  Deming  SL,  Rodriguez-Gil  JL,  Demichele  A,  Chanock  SJ,  Blot  W,  Signorello  L,  Cai 

Q,  Li  G,  Long  J,  Huo  D,  Zheng  Y,  Cox  NJ,  Olopade  01,  Ogundiran  TO,  Adebamowo  C, 
Nathanson  KL,  Domchek  SM,  Simon  MS,  Hennis  A,  Nemesure  B,  Wu  SY,  Leske  MC,  Ambs  S, 
Hutter  CM,  Young  A,  Kooperberg  C,  Peters  U,  Rhie  SK,  Wan  P,  Sheng  X,  Pooler  LC,  Van  Den 
Berg  DJ,  Le  Marchand  L,  Kolonel  LN,  Henderson  BE,  Haiman  CA. 

Human  Genetics  2012  Aug  25 

A  common  variant  at  the  TERT-CLPTM1L  locus  is  associated  with  estrogen  receptor-negative 
breast  cancer. 
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Conclusion 


Genome-wide  studies  of  common  and  rare  genetic  variation  conducted  in  multiple  populations 
will  be  required  to  reveal  the  complete  spectrum  of  susceptibility  alleles  that  contribute  to  risk  of 
breast  cancer  globally.  In  a  genome-wide  scan  of  common  genetic  variation  in  >3,000  African 
American  cases  and  >2,700  controls,  followed  by  replication  testing  of  the  most  significant 
associations  (p<10  4)  in  an  independent  set  of  >3,000  cases  and  >3,000  controls,  we  identified 
two  suggestive  associations  with  breast  cancer  risk  that  replicated  in  stage  2  at  p<0.05 
[chromosome  14q31  (p  =  4.3><1  O'6)  and  3p26  (p  =  1.5x1  O'5)];  however,  these  associations  did 
not  reach  the  standard  level  of  genome-wide  significance.  These  regions  have  not  been 
highlighted  in  previous  GWAS  conducted  in  other  racial/ethnic  populations  and  each  association 
requires  further  validation  in  additional  studies.  A  strength  of  the  2-stage  GWAS  we  conducted 
is  that  it  includes  most  existing  case-control  studies  of  breast  cancer  conducted  in  women  of 
African  ancestry.  In  this  2-stage  design,  we  had  80%  statistical  power  to  identify  a  common  risk 
variant  (frequency  of  >  10%)  that  conveys  a  risk  per  allele  of  1.3  at  genome-wide  significance 
(p=5x10-8).  Thus,  we  were  able  to  rule  out  variants  with  large  effects  if  they  were  among  the  top 
0.007%  in  stage  1  (and  thus  taken  to  stage  2)  and  were  adequately  tagged  by  the  common 
SNPs  on  the  1M  array.  However,  we  are  likely  to  have  missed  some  milder  associations.  In 
previous  GWAS  of  breast  cancer  in  European  ancestry  populations,  most  risk  variants 
eventually  identified  were  not  among  the  most  statistically  significant  in  stage  1  and  were  only 
revealed  through  testing  of  large  numbers  of  SNPs  in  additional  replication  stages.  To  identify 
novel  risk  loci  for  overall  breast  cancer  in  African  ancestry  populations  will  require  continued 
collaborative  efforts  and  investigators  willing  to  test  larger  numbers  of  SNPs  in  their  respective 
studies. 

In  our  meta-analyses  of  GWAS  for  ER-negative  breast  cancer  we  identified  three  novel 
loci  for  breast  cancer  with  two  being  specific  for  ER-negative  disease.  SNP  rsl 7530068  at 
chromosome  6q14  was  associated  with  overall  breast  cancer  risk  and  showed  no  differential 
association  depending  on  ER  status.  The  association  of  SNP  rs2284378  at  20q  1 1 ,  however, 
was  stronger  for  ER-negative  than  ER-positive  breast  cancer.  SNP  rsl 0069690  at  5p15  also 
appeared  to  be  more  associated  with  ER-negative  and  triple  negative  disease.  Identification  of 
the  variants  directly  responsible  for  the  association  will  be  required  to  fully  address  the  extent  to 
which  these  loci  contribute  to  the  greater  incidence  of  ER-negative  and  triple  negative  tumors  in 
women  of  African  ancestry.  However,  it  is  notable  that  the  risk  allele  frequency  of  rsl  0069690  is 
greater  in  African  American  women  (frequency,  0.57)  than  in  women  of  European  ancestry 
(frequency,  0.26).  If  this  variant  is  an  equally  good  surrogate  for  the  biologically  functional  allele 
in  each  population,  then  this  locus  may  be  responsible  for  a  15%  (95%  Cl,  10-20%)  increase  in 
the  incidence  rate  of  ER  negative  or  triple  negative  breast  cancer  in  women  of  African  compared 
to  European  ancestry.  Larger  studies  with  well-characterized  tumor  pathology  information  will  be 
needed  to  determine  if  the  associations  we  observed  applies  other  breast  cancer  subtypes. 
Furthermore,  our  findings  provide  further  support  for  the  presence  of  genetic  susceptibility  to 
ER-negative  breast  cancer  subtypes. 

“So  What?” 

Identifying  new  loci  associated  with  ER-negative  and  triple  negative  breast  cancer  will  continue 
to  provide  insight  into  the  biological  mechanisms  underlying  this  more  aggressive  form  of  breast 
cancer,  and  could  result  in  improvements  in  risk  prediction  and  treatment. 
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Abstract  Genome-wide  association  studies  (GWAS)  in 
diverse  populations  are  needed  to  reveal  variants  that  are 
more  common  and/or  limited  to  defined  populations.  We 
conducted  a  GWAS  of  breast  cancer  in  women  of  African 
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ancestry,  with  genotyping  of  >1,000,000  SNPs  in  3,153 
African  American  cases  and  2,831  controls,  and  replication 
testing  of  the  top  66  associations  in  an  additional  3,607 
breast  cancer  cases  and  11,330  controls  of  African  ances¬ 
try.  Two  of  the  66  SNPs  replicated  ( p  <  0.05)  in  stage  2, 
which  reached  statistical  significance  levels  of  10-6  and 
1 0_‘  in  the  stage  1  and  2  combined  analysis  (rs4322600  at 
chromosome  14q31:  OR  =  1.18,/?  =  4.3  x  10_6;rsl0510 
333  at  chromosome  3p26:  OR  =  1.15,  p  =  1.5  x  10-5). 
These  suggestive  risk  loci  have  not  been  identified  in 

L.  Bernstein 

Division  of  Cancer  Etiology,  Department  of  Population  Science, 
Beckman  Research  Institute,  City  of  Hope,  Duarte,  CA,  USA 

W.  Zheng  •  S.  L.  Deming  •  W.  Blot  ■  L.  Signorello  •  Q.  Cai  • 

G.  Li  •  J.  Long 

Division  of  Epidemiology,  Department  of  Medicine, 

Vanderbilt  Epidemiology  Center,  Vanderbilt-Ingram  Cancer 
Center,  Vanderbilt  University  School  of  Medicine, 

Nashville,  TN,  USA 

J.  R.  Palmer  •  E.  A.  Ruiz-Narvaez 

Slone  Epidemiology  Center  at  Boston  University, 

Boston,  MA,  USA 

J.  J.  Hu  •  J.  L.  Rodriguez-Gil 

Department  of  Epidemiology  and  Public  Health,  Sylvester 
Comprehensive  Cancer  Center,  University  of  Miami  Miller 
School  of  Medicine,  Miami,  FL,  USA 

T.  R.  Rebbeck  •  A.  DeMichele  •  K.  L.  Nathanson  • 

S.  M.  Domchek 

University  of  Pennsylvania  School  of  Medicine, 

Philadelphia,  PA,  USA 


E.  M.  John 

Stanford  University  School  of  Medicine, 
Stanford  Cancer  Institute,  Stanford,  CA,  USA 


Published  online:  25  August  2012 


<£)  Springer 


Hum  Genet 


previous  GWAS  in  other  populations  and  will  need  to  be 
examined  in  additional  samples.  Identification  of  novel  risk 
variants  for  breast  cancer  in  women  of  African  ancestry 
will  demand  testing  of  a  substantially  larger  set  of  markers 
from  stage  1  in  a  larger  replication  sample. 

Introduction 

Genome-wide  association  studies  (GWAS)  of  breast  cancer 
have  been  conducted  almost  exclusively  in  populations  of 
European  ancestry,  and  have  firmly  established  associa¬ 
tions  with  a  number  of  common  susceptibility  loci  that 
contribute  modest  effects  (relative  risks  <1.3)  (Ahmed 
et  al.  2009;  Antoniou  et  al.  2010;  Easton  et  al. 
2007;Fletcher  et  al.  2011;  Ghoussaini  et  al.  2012;  Haiman 
et  al.  2011b;  Hunter  et  al.  2007;  Kim  et  al.  2012;  Long 
et  al.  2012;  Stacey  et  al.  2007,  2008;  Thomas  et  al.  2009; 
Turnbull  et  al.  2010;  Zheng  et  al.  2009b).  These  discov¬ 
eries  provide  support  for  the  polygenic  model  of  breast 
cancer  susceptibility  (Pharoah  et  al.  2002),  as  well  as  clues 
as  to  important  biological  pathways  involved  in  the  path¬ 
ogenesis  of  breast  cancer.  For  example,  the  most  strongly 
associated  risk  locus  for  breast  cancer  revealed  through 
GWAS  has  been  the  region  containing  the  fibroblast 
growth  factor  receptor  2  ( FGFR2 )  at  chromosome  10q26 
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(Easton  et  al.  2007;  Hunter  et  al.  2007;  Meyer  et  al.  2008). 
FGFR2  is  a  member  of  the  FGFR  family  of  receptor 
tyrosine  kinases  (RTKs)  which  regulate  cell  proliferation, 
differentiation  and  apoptosis  (Tenhagen  et  al.  2012).  The 
risk  variant  on  chromosome  14q24  is  located  in  intron  12 
of  RAD51B  which  is  a  member  of  the  RAD51  protein 
family.  RAD51  proteins  are  essential  for  DNA  repair  by 
homologous  recombination  (Tarsounas  et  al.  2004),  a  DNA 
repair  pathway  with  an  established  and  important  role  in 
breast  cancer  development.  A  more  recent  study,  which 
included  African  American  subjects  from  the  current  study, 
revealed  a  risk  marker  at  the  telomerase  reverse  trans¬ 
criptase  (TERT)  locus  (Haiman  et  al.  2011b),  a  protein  that 
controls  telomere  length  and  is  also  implicated  in  onco¬ 
genesis  (Kim  et  al.  1994).  Many  of  the  risk  variants  iden¬ 
tified  by  GWAS,  however,  are  located  in  gene  deserts,  or 
near  genes  with  roles  in  breast  cancer  etiology  that  are 
currently  unknown. 

The  search  for  additional  low  penetrance  alleles  for 
breast  cancer  in  specific  racial/ethnic  populations  has 
revealed  additional  variants  that  are  important  globally  or 
more  common  and/or  limited  to  defined  populations.  For 
example,  a  GWAS  conducted  among  Chinese  women 
identified  a  novel  risk  locus  for  breast  cancer  near  the  gene 
for  the  estrogen  receptor  (ER)  on  chromosome  6  which  had 
not  been  revealed  in  previous,  well-powered  GWAS  in 
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populations  of  European  ancestry  (Zheng  et  al.  2009b). 
A  GW  AS  of  prostate  cancer  in  men  of  African  ancestry  also 
identified  a  novel  risk  variant  at  17ql2  that  is  not  observed  in 
other  populations  (Haiman  et  al.  201  la).  In  search  for  risk 
variants  for  breast  cancer  that  may  be  important  to  women  of 
African  ancestry,  we  analyzed  >1  million  common  SNPs  in 
3,153  African  American  breast  cancer  cases  and  2,831 
African  American  controls,  and  examined  the  most  statisti¬ 
cally  significant  associations  in  a  second  stage  of  3,607  cases 
and  1 1.330  controls  of  African  ancestry. 

Materials  and  methods 

Study  populations 

Stage  1  of  the  GW  AS  included  African  American  partici¬ 
pants  from  9  epidemiological  studies  of  breast  cancer, 
comprising  a  total  of  3,153  cases  and  2,831  controls  (cases/ 
controls:  The  Multiethnic  Cohort  study  (MEC),  734/1,003; 
The  Los  Angeles  component  of  The  Women’s  Contra¬ 
ceptive  and  Reproductive  Experiences  (CARE)  Study, 
380/224;  The  Women’s  Circle  of  Health  Study  (WCHS), 
272/240;  The  San  Francisco  Bay  Area  Breast  Cancer  Study 
(SFBCS),  172/231;  The  Northern  California  Breast  Cancer 
Family  Registry  (NC-BCFR),  440/53;  The  Carolina  Breast 
Cancer  Study  (CBCS),  656/608;  The  Prostate,  Lung, 
Colorectal,  and  Ovarian  Cancer  Screening  Trial  (PLCO) 
Cohort,  64/133;  The  Nashville  Breast  Health  Study 
(NBHS),  310/186;  and.  The  Wake  Forest  University  Breast 
Cancer  Study  (WFBC),  125/153).  Replication  testing  was 
conducted  in  an  independent  sample  of  3,607  breast  cancer 
cases  and  11,330  controls  from  9  additional  studies  of 
breast  cancer  in  women  of  African  ancestry  (The  Black 
Women’s  Health  Study  (BWHS),  826/1,167;  The 
Women’s  Insights  and  Shared  Experiences  study  (WISE), 
174/458;  NBHS/Southern  Community  Cohort  (SCCS), 
981/851;  The  Nigerian  Breast  Cancer  Study  (NBCS), 
681/282;  The  Barbados  National  Cancer  Study  (BNCS), 
93/244;  The  Racial  Variability  in  Genotypic  Determinants 
of  Breast  Cancer  Risk  Study  (RVGBC),  151/272;  The  Balti¬ 
more  Breast  Cancer  Study  (BBCS),  117/111;  The  Chicago 
Cancer  Prone  Study  (CCPS),  268/261;  and.  The  Women’s 
Health  Initiative  (WHI),  316/7,484). 

Sample  size  and  selected  characteristics  for  these  studies 
are  summarized  in  Supplemental  Tables  1  and  2  and 
detailed  information  about  the  design  and  organization  of 
each  study  is  provided  in  supporting  information. 

Genotyping  and  quality  control 

Genotyping  in  stage  1  was  conducted  using  the  Illumina 
HumanlM-Duo  BeadChip.  Of  the  5,984  samples  from 


these  studies  (3,153  cases  and  2,831  controls),  we 
attempted  genotyping  of  5,932,  removing  samples 
( n  =  52)  with  DNA  concentrations  <20  ng/ul.  Following 
genotyping,  we  removed  samples  based  on  the  following 
exclusion  criteria:  (1)  unexpected  replicates  (>98.9  % 
genetically  identical)  that  we  were  able  to  confirm  through 
discussions  with  study  investigators  (only  one  of  each 
replicate  was  removed,  n  —  15);  (2)  unknown  replicates  that 
we  were  not  able  to  confirm  (pair  or  triplicate  removed, 
n  =  14);  (3)  samples  with  call  rates  <95  %  after  a  second 
genotyping  attempt  ( n  =  100);  (4)  samples  with  <5  %  Afri¬ 
can  ancestry  (n  =  36)  (discussed below);  and  (5)  samples  with 
<15  %  mean  heterozygosity  of  SNPs  on  the  X  chromosome 
and/or  similar  mean  allele  intensities  of  SNPs  on  the  X  and  Y 
chromosomes  (n  =  6)  as  these  are  likely  to  be  males. 

We  removed  SNPs  with  <95  %  call  rate  ( n  —  21,732) 
or  minor  allele  frequencies  (MAFs)  <1  %  (n  =  80,193). 
To  assess  genotyping  reproducibility,  we  included  138 
known  replicate  samples;  the  average  concordance  rate  was 
99.95  %  (>99.93  %  for  all  pairs).  We  also  eliminated 
SNPs  with  genotyping  concordance  rates  <98  %  based  on 
the  replicates  (n  =  11,701).  The  final  analysis  dataset 
included  1,043,036  SNPs  genotyped  on  3,016  cases  and 
2,745  controls,  with  an  average  SNP  call  rate  of  99.7  % 
and  average  sample  call  rate  of  99.8  %.  Hardy-Weinberg 
equilibrium  (HWE)  was  not  used  as  a  criterion  for 
removing  SNPs;  none  of  the  SNPs  selected  for  replication 
deviated  from  HWE  in  controls  in  each  study  (based  on  a 
cut-off  of  p  <  0.001). 

We  selected  66  SNPs  with  p  values  <2  x  10-4  in  stage 

1  for  evaluation  in  the  second  stage.  These  SNPs  were 
selected  from  53  regions  following  linkage  disequilibrium 
(LD)  pruning  of  correlated  SNPs.  Two  of  these  SNPs  were 
located  near  a  previously  validated  breast  cancer  risk  locus 
[rsl2355688  at  10q22,  241  kb  downstream  of  rs704010, 

2  —  0  in  both  CEU  and  YRI  populations  from  1000 
Genomes  Project  (March  2010  release)  (Turnbull  et  al. 
2010);  and  rs3745185  at  19pl3,  10  kb  downstream  of 
rs2363956,  r2  =  0.57  and  0.19  in  the  CEU  and  YRI  pop¬ 
ulations  from  1000  Genomes  Project  (March  2010  release), 
respectively  (Antoniou  et  al.  2010)].  Genotyping  in  the 
replication  studies  was  performed  using  the  Sequenom 
platform  (BWHS),  OpenArray  (WISE  and  NBHS/SCCS), 
the  Affymetrix  6.0  SNP  array  (WHI)  (Hutter  et  al.  2011) 
and  Illumina  GoldenGate  (all  other  studies)  (see  Support¬ 
ing  Information).  Blinded  duplicate  samples  (5-10  %) 
were  included  in  the  replication  studies  and  concordance  of 
these  samples  was  >98  %  in  all  studies.  The  number  of 
SNPs  that  were  genotyped  successfully  in  each  stage  2 
study  ranged  from  51  to  63.  The  average  call  rate  for  all 
SNPs  in  stage  2  was  98.8  %  (range  for  call  rates  of  a  SNP 
within  study  71.4-100  %).  Call  rates  by  SNP  and  study  are 
shown  in  Supplemental  Table  3. 
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Estimation  of  African  ancestry 

In  stage  1,  we  utilized  STRUCTURE  (Pritchard  et  al.  2000) 
to  infer  percent  African  ancestry  on  an  individual  level.  A 
total  of  2,546  ancestry-informative  SNPs  from  the  Illumina 
array  were  selected  based  on  low  inter-marker  correlation 
and  ability  to  differentiate  between  samples  of  African  and 
European  descent.  In  evaluating  the  distribution  of  the 
fraction  of  African  ancestry  across  the  stage  1  populations, 
statistically  significant  differences  (ANOVA  p  <  10-16) 
were  noted  (Supplemental  Figure  1).  We  also  applied 
principal  components  analysis  (PCA)  (Price  et  al.  2006)  to 
estimate  axes  of  variation  among  the  5,761  individuals 
using  the  same  2,546  ancestry  informative  markers.  The 
first  eigenvector  accounted  for  10.1  %  of  the  variation 
between  subjects,  and  subsequent  eigenvectors  accounted 
for  not  more  than  0.5  %.  Using  input  genotypes  from  the 
HapMap  populations,  CEU  (CEPH  Utah),  YRI  (Yoruba), 
and  JPT  (Japanese),  we  determined  that  the  first  eigen¬ 
vector  clearly  differentiates  between  Europeans  (CEU)  and 
West  Africans  (YRI)  in  the  HapMap  samples  (Supple¬ 
mental  Fig.  2). 

Statistical  analysis 

We  examined  the  observed  versus  the  expected  distribution 
of  the  Chi-squared  test  statistics  using  a  1-degree  of  free¬ 
dom  (df)  trend  test,  comparing  genotype  counts  for  each 
SNP  in  cases  versus  controls.  All  tests  of  statistical  sig¬ 
nificance  were  two-sided.  To  improve  coverage,  we  aug¬ 
mented  the  set  of  SNPs  tested  for  association  through 
imputation  using  MACH  (Li  and  Abecasis  2006).  Phased 
haplotypes  from  the  120  CEU  and  120  YRI  founders  in 
HapMap  Phase  2  were  used  to  infer  genotypes  of  all  Phase 
2  SNPs  that  were  not  available  on  the  Illumina  1M  Duo  or 
did  not  pass  our  quality  control  (QC)  criteria.  Odds  ratios 
(OR)  and  95  %  confidence  intervals  (Cl)  for  each  SNP 
were  estimated  using  unconditional  logistic  regression, 
adjusting  for  age,  the  first  eigenvector  and  study.  The 
SFBCS  and  NC-BCFR  studies  were  conducted  in  the  same 
San  Francisco  Bay  Area  population  and  were  combined  in 
all  analyses. 

In  the  replication  studies,  ORs  and  95  %  CIs  for  each 
SNP  were  estimated  using  unconditional  logistic  regres¬ 
sion,  adjusting  for  age,  region  within  the  WHI  and  esti¬ 
mated  genetic  ancestry.  Ancestry  information  was 
available  for  all  stage  2  studies  except  WISE  (Supporting 
Information).  Overall  testing  of  single  SNP  associations 
was  conducted  via  meta-analyses  of  results  from  the  stage 
1  and  stage  2  studies. 

We  also  conducted  combined  GW  AS  and  admixture- 
based  statistical  tests  to  assess  the  contribution  of  local 
ancestry  on  the  SNP  associations.  For  each  subject  in  our 


analysis,  we  inferred  local  ancestry,  which  defines  the 
proportion  of  European  and  African  ancestry  at  each  gen- 
otyped  and  imputed  SNP.  To  infer  local  ancestry  in  our 
GW  AS  panel  of  5,761  African  American  women,  we 
applied  the  program  HAPM1X  (Price  et  al.  2009).  HAP- 
MIX  builds  a  Hidden  Markov  Model  (HMM)  using  phased 
haplotype  data  that  are  representative  of  the  two  source 
populations  assumed  to  be  ancestral  to  the  admixed  (study) 
data.  In  this  case,  we  provided  the  same  HapMap  dataset 
that  was  used  for  imputation  (i.e.,  240  CEU  +  YRI  foun¬ 
der  haplotypes  per  chromosome)  as  input.  HAPMIX 
reports  posterior  probabilities  for  each  subject  at  each  SNP 
of  carrying  0,  1  and  2  copies  of  a  European  allele. 

Combined  GWAS  and  admixture-based  statistical  tests 
were  conducted  to  make  inferences  about  regions  of  the 
genome  that  explain  not  only  case-control  differences  in 
disease  risk  based  on  SNP  associations,  but  also  risk  dif¬ 
ferences  based  on  local  genetic  ancestry.  We  utilized  the 
MIXSCORE  program  (Pasaniuc  et  al.  2011)  which  takes  as 
input  results  from  a  GWAS  scan  and  an  admixture  scan 
(specifically  HAPMIX  output),  and  computes  several  sta¬ 
tistics  that  incorporate  allele  frequency  information  from 
both  sources  of  evidence.  The  SUM  score  is  a  2-df  Chi- 
squared  test  that  simultaneously  tests  for  association  (i.e.,  a 
case-control  difference  in  allele  frequency)  and  admixture 
evidence  (i.e.,  a  deviation  from  the  genome-wide  propor¬ 
tion  of  European  ancestry).  The  MIX  score  also  tests  for 
both  evidence  of  admixture  and  association,  but  assumes 
the  odds  ratios  for  admixture  and  association  are  equal, 
which  is  potentially  more  powerful  when  this  assumption  is 
true  since  it  is  a  1-df  test. 


Results 

The  stage  1  analysis  included  3,016  cases  and  2,745  con¬ 
trols  among  African  American  women  from  9  epidemio¬ 
logical  studies  of  breast  cancer.  The  age  of  the  cases  and 
controls  in  stage  1  ranged  from  22  to  87  years  with  the 
median  ages  being  55  and  58  years,  respectively  (Supple¬ 
mental  Table  1).  The  analysis  of  the  most  statistically 
significant  associations  from  stage  1  was  conducted  in 
3,533  cases  and  11,046  controls  from  an  additional  9 
studies.  The  age  of  the  cases  and  controls  in  stage  2  ranged 
from  18  to  92  years  with  the  median  ages  being  50  and 
53  years,  respectively  (Supplemental  Table  2). 

We  observed  no  evidence  of  inflation  of  the  test  statistic 
(2  =  1.01)  for  the  1,043,036  genotyped  and  2,067,098 
imputed  SNPs  analyzed  in  stage  1,  and  no  excess  of  very 
small  p  values  beyond  what  was  expected  (Fig.  1).  We 
observed  no  SNP  to  be  associated  with  disease  status  at  a 
genome-wide  level  of  significance  (p  <  5  x  10-8)  in  stage 
1  (Fig.  2).  The  most  statistically  significant  association  was 
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Fig.  1  The  distribution  of  observed  versus  expected  —  log,0  p  values 
from  stage  1  adjusted  for  age,  study  and  the  first  principal  component 
(PCI) 

noted  with  SNP  rs7610073  located  in  intron  2  of  the  gene 
GRM7  (metabotropic  glutamate  receptor  7)  on  chromo¬ 
some  3p26  (risk  allele  frequency  0.64;  OR  per  allele  1.22; 
p  =  7.4  x  10-7).  A  second  signal  was  also  noted  ~486  kb 
upstream  of  GRM7  (rsl0510333:  risk  allele  frequency  0.18; 


OR  per  allele  1.24;  p  =  8.2  x  10-6).  The  associations  with 
these  two  markers  were  independent  and  remained  statisti¬ 
cally  significant  when  both  were  included  in  the  same  model 
( p  values  of  8.3  x  10-7  and  9.3  x  10-6,  respectively). 
Shown  in  Table  1  are  the  genotyped  SNPs  with  p  values 
<  1 0  5  in  stage  1,  as  well  as  SNPs  that  replicated  in  stage  2 
(discussed  below). 

We  selected  66  genotyped  SNPs  with  association  p  val¬ 
ues  less  than  2  x  10  for  replication  testing  in  the  stage  2 
studies.  None  of  these  SNPs  replicated  with  stage  2-wide 
significance  of  <0.0008  (0.05/66),  but  two  replicated  with  a 
p  value  <0.05  and  an  OR  in  the  same  direction  as  that 
observed  in  stage  1  (Table  1).  Combining  results  from 
stages  1  and  2,  no  SNP  achieved  genome-wide  signifi¬ 
cance.  The  smallest  combined  p  values  were  noted  for  the 
two  SNPs  that  replicated  in  stage  2:  rs4322600  located 
~  100  kb  upstream  of  the  gene  GALC  (galactosylcerami- 
dase)  on  chromosome  14q31(risk  allele  frequency  0.78,  OR 
per  allele  1.18,  p  —  4.3  x  10-6)  and  rsl0510333  located 
~486  kb  upstream  of  GRM7  on  chromosome  3p26  (risk 
allele  frequency  0.18,  OR  per  allele  1.15,  p  —  1.5  x  10-5) 
(Table  1).  We  found  no  strong  statistical  evidence  that  the 
associations  with  these  two  loci  differ  by  ER  status 
(p  values  for  heterogeneity  in  case-only  testing: 
rsl0510333:  p  =  0.67;  rs4322600:  p  =  0.85). 

Using  the  MIXSCORE  program,  we  simultaneously 
tested  the  null  hypothesis  of  no  association  and  admixture 
at  each  loci  defined  by  the  66  most  significant  variants 
identified  in  Stage  1.  SNP  rs76 10073,  which  had  the  largest 


Fig.  2  A  Manhattan  plot 
showing  the  — log10  p  values 
which  test  for  case-control 
association  to  disease  for 
genotyped  and  imputed  SNPs 
by  chromosome  in  stage  1 
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MIX  score  of  24.5  (p  =  7.5  x  10-7)  also  had  the  smallest 
p  value  in  the  first  stage  (Supplemental  Table  4).  The  risk 
allele  (the  “A”  allele  for  rs7610073)  was  not  strongly 
differentiated  (60  %  in  HapMap  YRI  vs.  8 1  %  in  HapMap 
CEU)  and  the  MIX  score  p  value  was  almost  identical  to 
the  p  value  from  our  association  scan.  Association  p  values 
were  generally  stronger  than  the  SUM  or  MIX  score,  so 
admixture  did  not  make  a  substantive  contribution  in  joint 
evidence  of  admixture  and  association  for  these  66  SNPs, 
as  indicated  in  Supplemental  Table  4.  All  together,  these 
findings  seem  to  indicate  that  the  associations  at  the  most 
significant  loci  in  Stage  1  are  not  influenced  by  differences 
in  local  ancestry  between  cases  and  controls,  meaning  that 
any  causal  variants  in  these  regions  are  not  appreciably 
differentiated  in  frequency  between  cases  and  controls. 


Discussion 

Genome-wide  studies  of  common  and  rare  genetic  variation 
conducted  in  multiple  populations  will  be  required  to  reveal 
the  complete  spectrum  of  susceptibility  alleles  that  con¬ 
tribute  to  risk  of  breast  cancer  globally.  In  a  genome-wide 
scan  of  common  genetic  variation  in  >3,000  African 
American  cases  and  >2,700  controls,  followed  by  replica¬ 
tion  testing  of  the  most  significant  associations  (p  <  2  x 
10-4)  in  an  independent  set  of  >3,500  cases  and  >11,000 
controls,  we  identified  two  suggestive  associations  with 
breast  cancer  risk  that  replicated  in  stage  2  at  p  <  0.05 
[chromosome  14q31  (p  —  4.3  x  10-6)  and  3p26  (p  = 
1.5  x  10-5)];  however,  these  associations  did  not  reach  the 
standard  level  of  genome-wide  significance.  These  regions 
have  not  been  highlighted  in  previous  GWAS  conducted  in 
other  racial/ethnic  populations  and  each  association 
requires  further  validation  in  additional  studies. 

Populations  of  African  ancestry  have  greater  genetic 
diversity  and  lower  levels  of  LD  among  chromosomal  loci 
(Campbell  and  Tishkoff  2008;  Reed  and  Tishkoff  2006). 
Because  of  LD  patterns  and  allele  frequencies  that  differ 
from  non-African  populations,  GWAS  results  from  Euro¬ 
pean  or  Asian  populations  are  not  always  replicable  in 
populations  of  African  ancestry  (Chen  et  al.  2010;  Huo 
et  al.  2012;  Hutter  et  al.  2011;  Ruiz-Narvaez  et  al.  2010; 
Zheng  et  al.  2009a).  Fine  mapping  of  known  breast  cancer 
risk  loci  in  populations  of  African  ancestry  has  revealed 
risk-associated  markers  that  are  more  relevant  to  African 
populations  and  contribute  to  modeling  of  genetic  risk  in 
this  population  (Chen  et  al.  2011;  Ruiz-Narvaez  et  al.  2010; 
Udler  et  al.  2009).  Large  GWAS  in  populations  of  African 
ancestry,  with  proper  control  of  population  structure,  will 
be  required  to  discover  additional  disease  susceptibility 
variants  that  better  define  the  genetic  profile  of  breast 
cancer  in  this  population. 
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A  strength  of  the  present  study  is  that  it  includes  most 
existing  case-control  studies  of  breast  cancer  conducted  in 
women  of  African  ancestry.  In  this  two-stage  design,  we 
had  80  %  statistical  power  to  identify  a  common  risk 
variant  (frequency  of  >10  %)  that  conveys  a  risk  per  allele 
of  1.3  at  genome-wide  significance  (p  —  5  x  10“8).  Thus, 
we  were  able  to  rule  out  variants  with  large  effects  if  they 
were  among  the  top  0.007  %  in  stage  1  (and  thus  taken  to 
stage  2)  and  were  adequately  tagged  by  the  common  SNPs 
on  the  1  M  array.  However,  we  are  likely  to  have  missed 
some  milder  associations.  In  previous  GWAS  of  breast 
cancer  in  European  ancestry  populations,  most  risk  variants 
eventually  identified  were  not  among  the  most  statistically 
significant  in  stage  1  and  were  only  revealed  through 
testing  of  large  numbers  of  SNPs  in  additional  replication 
stages.  To  identify  novel  risk  loci  for  breast  cancer  in 
African  ancestry  populations  will  require  continued  col¬ 
laborative  efforts  and  investigators  willing  to  test  larger 
numbers  of  SNPs  in  their  respective  studies. 

Our  attempt  to  apply  joint  admixture  and  association 
mapping,  using  MIXSCORE,  did  not  provide  additional 
suggestive  risk  variants  beyond  those  found  using  association 
methods  alone.  This  suggests  that  the  associations  observed  at 
the  most  significant  regions  in  Stage  1  are  not  weakened  by 
ancestry  differences  between  cases  and  controls,  and  thus,  the 
biologically  functional  alleles  are  unlikely  to  be  highly  dif¬ 
ferentiated  in  frequency  between  cases  and  controls.  Because 
of  the  limited  number  of  ER-negative  cases  in  stage  1 
(n  —  988)  and  stage  2  (n  =  423),  the  statistical  power  to  look 
at  subtypes  with  rate  differences  (e.g.,  ER-negative  disease, 
more  common  in  African  American  than  European  American 
women)  was  limited  and  not  attempted  for  GWAS  or 
admixture  testing.  However,  in  collaboration  with  GWAS  of 
ER-negative  breast  cancer  in  European  ancestry  populations, 
which  have  substantially  larger  numbers  of  ER-negative 
cases,  we  have  identified  a  novel  locus  for  ER-negative  breast 
cancer  at  5pl5  (TERT)  (Haiman  et  al.  2011b).  Genetic  vari¬ 
ation  at  this  locus  may  contribute  in  part  to  the  higher  inci¬ 
dence  of  ER-negative  disease  subtypes  in  women  of  African 
ancestry  (frequency  of  0.56  in  African  Americans  and  fre¬ 
quency  of  0.26  in  Whites)  (Haiman  et  al.  201  lb).  As  for  the 
analysis  of  overall  breast  cancer,  larger  studies  of  breast 
cancer  in  women  of  African  ancestry  will  be  needed  to  search 
for  novel  risk  loci  for  ER-negative  disease  subtypes  that  are 
important  for  and  may  be  limited  to  this  population. 

This  study  is  the  first  genome-wide  investigation  of 
common  genetic  variation  in  relationship  with  breast  can¬ 
cer  risk  in  women  of  African  ancestry.  The  suggestive 
associations  noted  with  risk  variants  at  14q31  and  3p26 
require  further  validation  in  additional  samples  of  African 
ancestry  as  well  as  in  other  populations.  Identification  of 
common  risk  variants  for  breast  cancer  in  African  ancestry 
populations  will  require  testing  a  larger  number  of  the  most 


statistically  significant  SNPs  from  stage  1  in  additional 
samples. 
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A  common  variant  at  the  TERT-CLPTM1L  locus  is 
associated  with  estrogen  receptor-negative  breast  cancer 
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Estrogen  receptor  (ER)-negative  breast  cancer  shows  a  higher 
incidence  in  women  of  African  ancestry  compared  to  women 
of  European  ancestry.  In  search  of  common  risk  alleles  for  ER- 
negative  breast  cancer,  we  combined  genome-wide  association 
study  (GWAS)  data  from  women  of  African  ancestry  (1,004 
ER-negative  cases  and  2,745  controls)  and  European  ancestry 
(1,718  ER-negative  cases  and  3,670  controls),  with  replication 
testing  conducted  in  an  additional  2,292  ER-negative  cases  and 
16,901  controls  of  European  ancestry.  We  identified  a  common 
risk  variant  for  ER-negative  breast  cancer  at  the  TERT-CLPTM1 L 
locus  on  chromosome  5p15  (rsl  0069690:  per-allele  odds 
ratio  (OR)  =  1.18  per  allele,  P  -  1.0  x  1 0-10).  The  variant  was 
also  significantly  associated  with  triple-negative  (ER-negative, 
progesterone  receptor  (PR)-negative  and  human  epidermal 
growth  factor-2  (HER2)-negative)  breast  cancer  (OR  =  1.25, 

P  =  1.1  x  1 0-9),  particularly  in  younger  women  (<50  years  of 
age)  (OR  =  1 .48,  P  =  1 .9  x  1 0~9).  Our  results  identify  a  genetic 
locus  associated  with  estrogen  receptor  negative  breast  cancer 
subtypes  in  multiple  populations. 

Compared  to  women  of  European  ancestry,  women  of  African  descent 
are  more  likely  to  be  diagnosed  with  ER-negative  breast  cancer1.  ER- 
negative  tumors  and  triple-negative  tumors  are  observed  at  even 
higher  rates  among  African  women  currently  residing  in  Africa2, 
suggesting  a  genetic  component  to  the  high  risk  of  ER-negative  pheno¬ 
types  in  women  of  African  descent.  Similarly,  ER-negative  breast 
cancers  and  triple-negative  breast  cancers  are  also  the  predominant 
histological  subtypes  in  women  with  germline  mutations  in  BRCA1 
(ref.  3).  The  enrichment  for  ER-negative  disease  in  this  genetically 
predisposed  population  also  suggests  the  existence  of  additional 
genetic  factors  that  contribute  to  the  risk  of  ER-negative  disease. 


Support  for  the  presence  of  these  factors  was  recently  provided  by 
a  GWAS  of  breast  cancer  in  BRCA1  mutation  carriers,  in  which  a 
common  risk  variant  for  ER-negative  breast  cancer  on  chromosome 
1 9p  1 3  was  identified  that  also  was  significantly  associated  with 
ER-negative  and  triple-negative  disease  in  the  general  population4. 

To  search  for  genetic  risk  factors  for  ER-negative  breast  cancer  pheno¬ 
types,  we  combined  results  from  a  GWAS  of  breast  cancer  in  African- 
American  women  (African  American  Breast  Cancer  Consortium 
(AABC):  3,016  cases  (1,004  with  ER-negative  disease)  and  2,745  controls) 
with  results  from  a  GWAS  of  triple-negative  breast  cancer  in  women 
of  European  ancestry  (Triple-Negative  Breast  Cancer  Consortium 
(TNBCC):  1,718  cases  and  3,670  controls).  Genotyping  in  AABC  was 
conducted  with  the  Illumina  Infinium  1M  Duo.  In  TNBCC,  cases  were 
genotyped  with  the  Alumina  660W  array  a  subset  of  cases  from  the 
Mammary  Carcinoma  Risk  Factor  Investigation  (MARIE)  component 
were  genotyped  using  the  IUumina  CNV370  SNP  array  and  cases  and 
controls  from  the  Helsinki  Breast  Cancer  Study  (HEBCS)  component 
were  genotyped  using  the  Alumina  550-Duo  SNP  array  Genotypes  of 
TNBCC  cases  were  compared  with  GWAS  data  for  publicly  available 
controls  (Online  Methods).  Both  studies  imputed  genotypes  for  common 
SNPs  in  phase  2  HapMap  populations  (release  21)  (Supplementary 
Table  1  and  Online  Methods).  A  total  of  3,154,485  SNPs,  genotyped 
and  imputed,  were  analyzed  in  stage  1  of  the  meta-analysis. 

We  observed  little  evidence  of  inflation  in  the  test  statistics  in 
AABC  (A.  =  1.01)  or  TNBCC  (A.  =  1.04)  or  in  the  meta-analysis  of  the 
two  GWAS  (A,  =  1.02;  Supplementary  Fig.  1).  In  the  combined  results, 
only  SNP  rsl0069690  (NCBI36/hgl8,  chr5:l,332,790)  located  in 
intron  4  of  the  TERT  gene  (encoding  telomerase  reverse  transcriptase) 
at  chromosome  5pl5  showed  a  genome- wide  significant  association 
with  ER-negative  breast  cancer  (AABC:  OR  per  allele  =  1.32,  P  =  1.3  x 
10“6;  TNBCC:  OR  =  1.25,  P  =  1.2  x  10"3;  combined  OR  =  1.29, 


Table  1  Association  of  rsl0069690  at  5pl5  and  ER-negative  breast  cancer  risk 


Stage 

Consortium  or  study 

Cases/controls3 

RAFbT  allele 

Heterozygotes 

OR  (95%  Cl)c 

Homozygotes 

OR  (95%  Cl)c 

Per-allele 

OR  (95%  Cl)c 

P  value  (l-d.f.)d 

i 

AABC 

1,002/2,743 

0.57 

1.32  (1.05-1.67) 

1.74  (1.37-2.21) 

1.32  (1.18-1.48) 

1.3  x  10~6 

i 

TNBCC 

2,785/1,602 

0.27 

1.10  (0.97-1.26) 

1.53  (1.21-1.95) 

1.18  (1.07-1.30) 

1.0  x  10“3 

2 

BPC3 

1,289/10,397 

0.26 

1.08  (0.96-1.22) 

1.19  (0.95-1.49) 

1.09  (0.99-1.19) 

0.077 

2 

SEARCH 

933/5,966 

0.26 

1.23  (1.06-1.43) 

1.44  (1.10-1.89) 

1.21  (1.09-1.36) 

6.9  x  10“4 

Combined 

6,009/20,708 

1.15  (1.06-1.23) 

1.46  (1.29-1.64) 

1.18  (1.13-1.25) 

1.0  x  10-10 

aNumber  of  cases  and  controls  with  genotype  data  for  rsl0069690.  All  subjects  were  directly  genotyped.  bRisk  allele  frequency  (RAF)  in  controls.  cAdjusted  for  age,  study  and  principal 
components  in  AABC.  Adjusted  for  age  and  country  in  TNBCC.  Adjusted  for  age,  study  and  country  (European  Prospective  Investigation  into  Cancer  and  Nutrition  (EPIC)  only)  in  BPC3.  Adjusted 
for  age  in  SEARCH.  Combined  results  are  from  the  meta-analysis.  dPfor  trend  (one  degree  of  freedom  (1-d.f.)). 


A  full  list  of  authors  and  affiliations  appears  at  the  end  of  the  paper. 
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Figure  1  A  regional  plot  of  the  — log10  P values  for  SNPs  at  the 
chromosome  5pl5  risk  locus  from  the  meta-analysis  of  the  AABC  and 
TNBCC  stage  1  studies.  SNP  rsl0069690  is  designated  with  the  purple 
diamonds.  The  colors  depict  the  strength  of  the  correlation  (r2)  between 
SNP  rsl0069690  and  the  SNPs  tested  in  the  region.  The  correlation  is 
estimated  using  1000  Genomes  Project  (1 KGP)  data  for  the  HapMap 
CEU  population  (June  2010).  Squares  are  SNPs  that  were  genotyped  in 
AABC  and  TNBCC.  Circles  are  SNPs  that  were  genotyped  in  one  study  and 
imputed  in  the  other  or  imputed  in  both  studies.  The  blue  line  indicates 
the  recombination  rates  in  centimorgans  (cM)  per  megabase  (Mb).  Also 
shown  are  the  SNP  Build  36  coordinates  and  genes  in  the  region. 


P  =  1.0  x  10-8).  Whereas  SNP  rsl0069690  was  genotyped  in  AABC,  it 
was  imputed  in  TNBCC  (R2  =  0.55).  To  verify  the  imputed  genotypes 
and  the  significance  of  the  association  in  TNBCC,  we  re-genotyped 
rsl0069690  in  available  DNA  samples  from  2,963  TNBCC  cases  and 
1,632  study-specific  TNBCC  controls  (Online  Methods).  Although 
the  overlapping  samples  between  the  TNBCC  GWAS  and  the  re- 
genotyping  study  showed  that  the  quality  of  imputation  for  rs  1 0069690 
in  the  GWAS  was  poor  (Online  Methods),  the  association  with  ER- 
negative  breast  cancer  for  rsl0069690  remained  statistically  signifi¬ 
cant  in  the  larger  re-genotyped  TNBCC  sample  (OR  =  1.18,  P=  1.0  x 
10-3;  Table  1  and  Fig.  1)  and  in  the  new  combined  results  for  AABC 
and  the  re-genotyped  TNBCC  sample  (OR  =  1.24,  P  =  1.6  x  10-8). 

To  further  confirm  the  association  at  5pl5,  we  genotyped  SNP 
rsl0069690  in  women  of  European  ancestry,  which  included  8,365 
cases  (1,359  ER  negative)  and  10,935  controls  from  the  US  National 
Cancer  Institute  Breast  and  Prostate  Cancer  Cohort  Consortium 
(BPC3)  and  6,182  cases  (933  ER  negative)  and  5,966  controls  from 
Studies  of  Epidemiology  and  Risk  Factors  in  Cancer  Heredity 
(SEARCH).  Evidence  for  replication  was  observed  for  rsl0069690 
and  ER-negative  breast  cancer  in  both  studies  (BPC3:  OR  =  1.09, 
P  =  0.077;  SEARCH:  OR  =  1.21,  P  =  6.9  x  10"4;  Table  1). 

In  combining  the  results  across  all  studies  (6,009  ER-negative 
cases  and  20,708  controls  with  genotype  data),  rsl0069690  was  signi¬ 
ficantly  associated  with  an  increased  risk  of  ER-negative  breast  cancer 
(OR  =  1.18,  95%  confidence  interval  (Cl),  1.13-1.25;  P  =  1.0  x  10_1°; 
Table  1 ).  The  risk  for  heterozygote  and  homozygote  carriers  was  1 . 1 5 
(95%  Cl,  1.06-1.23)  and  1.46  (95%  Cl,  1.29-1.64),  respectively.  We 
observed  little  evidence  of  heterogeneity  for  the  reported  associa¬ 
tion  for  this  variant  by  study  or  country  in  AABC  (test  for  hetero¬ 
geneity,  phet  =  0.86),  TNBCC  (phet  =  0.85)  or  BPC3  (phet  =  0.37; 
Supplementary  Table  2). 

In  an  analysis  of  ER-positive  cases,  rsl0069690  was  only  weakly 
associated  with  risk  in  African  Americans  (AABC:  1,558  ER-positive 


1.20  1.25  1.30  1.35  1.40  1.45  1.50 

Position  on  chr.  5  (Mb) 


cases  and  2,743  controls  with  genotype  data,  OR  =  1.08,  P  =  0.10) 
and  in  women  of  European  ancestry  (BPC3:  4,890  ER-positive  cases 
and  10,397  controls,  OR  =  1.03,  P  =  0.31;  SEARCH:  3,534  ER  posi¬ 
tive  cases  and  5,966  controls,  OR  =  1.03,  P  =  0.37;  combined  for  all 
populations:  OR  =  1.04,  P  =  0.06,  phet  =  0.64).  The  statistical  power 
to  detect  an  OR  of  1.18  (observed  for  ER-negative  disease)  for  ER- 
positive  disease  was  >99%  in  the  combined  sample  (9,982  cases  and 
19,106  controls),  assuming  the  risk  allele  frequency  of  0.26  in  people 
of  European  decent.  This  result  suggests  that  the  association  with 
breast  cancer  might  be  specific  for  ER-negative  subtypes  (P  value  for 
case-only  test  of  ER  negative  versus  ER  positive  =  1.7  x  10-4). 

We  further  stratified  the  cases  by  HER2  status  to  assess  whether 
this  region  may  be  a  risk  locus  for  triple-negative  disease.  In  AABC, 
BPC3  and  SEARCH  the  association  with  rsl0069690  was  greater  for 
triple-negative  tumors  than  for  ER-negative,  PR-negative,  HER2- 
positive  tumors  (Table  2),  and,  in  combining  all  studies,  including 
TNBCC,  the  association  with  rsl0069690  was  significantly  greater  for 
triple-negative  disease  (3,707  triple-negative  cases  and  19,728  controls 
with  genotype  data,  OR  =  1.25,  P  =  1.1  x  1 0-9;  376  ER-negative, 
PR-negative,  HER2-positive  cases  and  18,126  controls,  OR  =  1.03, 
P  =  0.71;  P  value  for  case-only  test  =  0.010).  The  association  with 
rs  10069690  was  also  observed  to  be  significantly  greater  for  ER-negative 
and  triple-negative  disease  at  younger  ages  (<50  years:  ER  negative, 


Table  2  Association  of  rsl0069690  at  5pl5  stratified  by  HER2  status 


Consortium  or 
study 

Subtype 

Cases/controls3 

Heterozygotes 

OR  (95%  Cl)b 

Homozygotes 

OR  (95%  Cl)b 

Per-allele 

OR  (95%  Ci)b 

P  value  (1-d.f. )c 

Case-only  P 

AABC4 

ER-PR-HER2- 

440/2,407 

1.35  (0.97-1.89) 

1.78  (1.27-2.49) 

1.33  (1.14-1.55) 

3.0  x  10~4 

0.19 

ER-PR-HER2+ 

115/2,407 

1.83  (0.99-3.40) 

1.59  (0.82-3.05) 

1.15  (0.86-1.52) 

0.34 

TNBCC 

ER-PR-HER2- 

2,785/1,602 

1.10  (0.97-1.26) 

1.53  (1.21-1.95) 

1.18  (1.07-1.30) 

1.0  x  10“3 

- 

BPC3e 

ER-PR-HER2- 

300/9,753 

1.19  (0.93-1.52) 

1.64  (1.10-2.46) 

1.25  (1.04-1.49) 

0.015 

0.13 

ER-PR-HER2+ 

198/9,753 

0.99  (0.73-1.33) 

0.95  (0.53-1.70) 

0.98  (0.78-1.23) 

0.87 

SEARCH 

ER-PR-HER2- 

182/5,966 

1.42  (1.03-1.95) 

2.41  (1.47-3.95) 

1.51  (1.20-1.89) 

4.2  x  10“4 

0.058 

ER-PR-HER2+ 

63/5,966 

1.31  (0.79-2.16) 

0.27  (0.04-1.95) 

0.97  (0.64-1.46) 

0.88 

Combined 

ER-PR-HER2- 

3,707/19,728* 

1.17  (1.06-1.30) 

1.69  (1.43-1.99) 

1.25  (1.16-1.34) 

1.1  x  10“9 

0.010 

ER-PR-HER2+ 

376/18,126 

1.15  (0.91-1.46) 

1.11  (0.73-1.70) 

1.03  (0.88-1.21) 

0.71 

aNumber  of  cases  and  controls  with  genotype  data  for  rsl0069690.  All  subjects  were  directly  genotyped.  bAdjusted  for  age,  study  and  principal  components  in  AABC.  Adjusted  for  age  and 
country  in  TNBCC.  Adjusted  for  age,  study  and  country  (EPIC  only)  in  BPC3.  Adjusted  for  age  in  SEARCH.  Combined  results  are  from  the  meta-analysis.  CP  for  trend  (1-d.f.).  dExcludes  San  Francisco 
Bay  Area  Breast  Cancer  Study  (SFBCS)  and  Prostate,  Lung,  Colorectal  and  Ovarian  Cancer  Screening  Trial  (PLCO),  as  HER2  data  were  not  available.  eExcludes  WHS  ,  as  HER2  data  were  not 
available.  'Includes  TNBCC.  Without  TNBCC:  922  ER-PR~HER2_  cases  and  18,126  controls;  OR  per  allele  =  1.33  (1.20-1.48),  P=  6.3  x  10"8;  heterozygotes:  OR  =  1.29  (1.09-1.53); 
homozygotes:  OR  =  1.85  (1.47-2.33). 

NATURE  GENETICS  VOLUME  43  |  NUMBER  1 2  |  DECEMBER  201 1  1211 


201 1  Nature  America,  Inc.  All  rights  reserved. 


LETTERS 


OR  =  1.32,  P  =  1.4  x  10-8;  triple  negative,  OR  =  1.48,  P  =  1.9  x  10-9; 
P  for  interaction  with  age  =  0.035  and  3.2  x  10-3,  respectively; 
Supplementary  Table  3).  We  found  no  significant  association  with 
rsl006960  among  ER-  and  PR-positive  cases  when  stratified  by  HER2 
status  (513  triple-positive  cases  and  18,126  controls,  OR  =  1.09, 
P  =  0.21;  2,808  ER-positive,  PR-positive,  HER2-negative  cases 
and  18,126  controls,  OR  =  1.04,  P  =  0.29),  which  suggests  the  asso¬ 
ciation  may  be  limited  to  triple-negative  disease  and  not  all  HER2- 
negative  tumors. 

Similar  to  8q24  (refs.  5-7)  and  llql3  (refs.  8-10),  the  TERT-CLPTM1L 
locus  harbors  multiple  risk  variants  for  different  cancers  (reviewed 
in  ref.  11).  SNP  rsl0069690  is  modestly  correlated  (r2  =  0.13-0.43  in 
1000  Genomes  Project  populations  of  European  and  African  ancestry, 
Supplementary  Fig.  2)  with  variants  found  for  serous  ovarian  cancer 
(rs7726159),  glioma  (rs2736100)  and  lung  cancer  (rs2736100, 
rs2735940)12-14.  Aside  from  risk  variant  rs2853676  found  for  glioma14, 
which  we  found  to  be  associated  with  risk  in  TNBCC  (P  =  0.014, 
r2  =  0.05  with  rsl0069690),  none  of  the  known  risk  variants  identi¬ 
fied  for  other  cancers  in  the  TERT-CLPTM1L  region  was  significantly 
associated  with  breast  cancer  risk  in  TNBCC  or  AABC.  Although 
rs7726159  was  not  tested  in  AABC  or  TNBCC  (as  it  is  not  on  the 
Illumina  arrays  or  in  HapMap) ,  it  is  noteworthy  that  the  first  common 
risk  variant  identified  for  ER- negative  breast  cancer,  at  chromosome 
1 9p  13,  is  also  associated  with  risk  for  serous  ovarian  cancer15.  The 
TERT  gene  encodes  the  catalytic  subunit  of  telomerase,  which  controls 
telomere  length,  a  process  linked  with  genomic  instability  and  impli¬ 
cated  in  tumorigenesis.  Sequencing  of  the  coding  exons  of  TERT  in  96 
African-American  women  (Online  Methods)  did  not  reveal  a  coding 
variant  strongly  correlated  with  rsl0069690.  The  TERT  locus  may 
highlight  another  biological  process  common  to  the  pathogenesis  of 
ER-negative  breast  cancer  subtypes  and  serous  ovarian  cancer  that  is 
also  shared  with  other  cancers. 

Identification  of  the  variant  directly  responsible  for  the  association 
will  be  required  to  fully  address  the  extent  to  which  this  locus  con¬ 
tributes  to  the  greater  incidence  of  ER-negative  and  triple-negative 
tumors  in  women  of  African  ancestry.  However,  it  is  notable  that  the 
risk  allele  frequency  of  rsl0069690  is  greater  in  African  American 
women  (frequency,  0.57)  than  in  women  of  European  ancestry  (fre¬ 
quency,  0.26).  If  this  variant  is  an  equally  good  surrogate  for  the  bio- 
Bbt  logically  functional  allele  in  each  population,  then  this  locus  may 
Hy  be  responsible  for  a  15%  (95%  Cl,  10-20%)  higher  incidence  rate  of 
ER-negative  or  triple-negative  breast  cancer  in  women  of  African 
compared  to  European  ancestry  (Online  Methods).  Larger  studies 
with  well-characterized  tumor  pathology  information  will  be  needed 
to  determine  whether  the  association  we  observed  applies  to  all  ER- 
negative  disease  or  just  the  triple-negative  subtype.  Our  findings 
provide  further  support  for  the  presence  of  genetic  susceptibility  to 
ER-negative  breast  cancer  subtypes  and  demonstrate  the  importance 
of  discovery  efforts  in  multiple  populations. 

METHODS 

Methods  and  any  associated  references  are  available  in  the  online 
version  of  the  paper  at  http://www.nature.com/naturegenetics/. 


Note:  Supplementary  information  is  available  on  the  Nature  Genetics  website. 
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ONLINE  METHODS 

Study  populations.  Stage  1  included  the  studies  of  the  AABC  and  the  TNBCC. 
AABC  includes  3,153  breast  cancer  cases  (1,017  ER  negative  and  1,608  ER 
positive)  and  2,831  controls  from  9  studies  (Supplementary  Table  1).  TNBCC 
is  composed  of  2,963  triple-negative  breast  cancer  cases  and  1,632  controls 
from  22  studies,  GWAS  genotype  data  from  an  additional  85  triple-negative 
breast  cancer  cases  and  222  controls  from  HEBCS,  and  public  GWAS  geno¬ 
type  data  from  3,448  controls  from  Cancer  Genetic  Markers  of  Susceptibility 
(CGEMS),  Wellcome  Trust  Case-Control  Consortium  (WTCCC),  KORA  and 
QIMR  (Supplementary  Table  1).  Replication  studies  include  8,365  breast 
cancer  cases  (1,359  ER  negative  and  5,255  ER  positive)  and  10,935  controls 
of  the  BPC3  and  6,182  breast  cancer  cases  (933  ER  negative  and  3,434  ER 
positive)  and  5,966  controls  of  the  SEARCH.  All  participants  in  these  studies 
have  provided  written  informed  consent  for  the  research,  and  approval  for  the 
study  was  obtained  from  the  ethics  review  boards  at  all  the  local  institutions. 
A  description  of  each  participating  study  is  provided  in  the  Supplementary 
Note.  Details  regarding  the  measurement  and  collection  of  ER,  PR  and  HER2 
data  for  each  study  are  provided  in  Supplementary  Table  4. 

Genotyping  and  quality  control.  Genotyping  in  AABC  was  conducted  using 
the  Illumina  HumanlM-Duo  BeadChip.  Of  the  5,984  samples  in  the  AABC 
Consortium  (3,153  cases  and  2,831  controls),  we  attempted  genotyping  of  5,932, 
removing  samples  (n  =  52)  with  DNA  concentrations  <20  ng/pl.  Following  gen¬ 
otyping,  we  removed  samples  on  the  basis  of  the  following  exclusion  criteria: 
(i)  unknown  replicates  (>98.9%  genetically  identical,  n  =  29);  (ii)  samples  with 
call  rates  <95%  after  a  second  attempt  ( n  =  100);  (iii)  samples  with  <5%  African 
ancestry  ( n  =  36)  (discussed  below);  and  (iv)  samples  with  <15%  mean  hetero¬ 
zygosity  of  SNPs  on  the  X  chromosome  and/or  similar  mean  allele  intensities 
of  SNPs  on  the  X  and  Y  chromosomes  ( n  =  6).  In  the  analysis,  we  removed 
SNPs  with  <95%  call  rates  ( n  =  21,732)  or  minor  allele  frequencies  (MAFs) 
<1%  ( n  =  80,193).  The  concordance  rate  for  blinded  duplicates  was  99.95%. 
We  also  eliminated  SNPs  with  genotyping  concordance  rates  <98%  based  on 
the  replicates  ( n  =  11,701).  The  final  analysis  data  set  included  1,043,036  SNPs 
genotyped  on  3,016  cases  (988  ER  negative,  1,520  ER  positive  and  the  remain¬ 
ing  508  cases  with  unknown  ER  status)  and  2,745  controls,  with  an  average 
SNP  call  rate  of  99.7%  and  average  sample  call  rate  of  99.8%.  The  call  rate  for 
rsl0069690  was  very  high  in  stage  1  (99.9%)  and  similar  in  cases  (99.9%)  and 
controls  (99.9%).  We  also  re-genotyped  rsl0069690  using  TaqMan  in  1,456  of 
the  stage  1  samples;  the  concordance  was  99.8%. 

Genotyping  for  the  TNBCC  GWAS  was  conducted  on  1,577  cases  from  ten 
studies  (Australian  Breast  Cancer  Tissue  Bank  (ABCTB),  Bavarian  Breast  Cancer 
Cases  and  Controls  (BBCC),  Dana-Farber  Cancer  Institute,  Fox  Chase  Cancer 
Center,  GENICA,  MARIE,  Melbourne  Collaborative  Cohort  Study  (MCBCS), 
pftk  Prospective  Study  of  Outcomes  in  Sporadic  Versus  Hereditary  Breast  Cancer 
PB  (POSH),  Sheffield  Breast  Cancer  Study  (SBCS))  using  the  Illumina  660-Quad  SNP 
~  array,  hi  addition,  a  set  of  MARIE  cases  (n  =  56)  were  genotyped  using  the  Illumina 

CNV370  SNP  array.  HEBCS  cases  ( n  =  85)  were  genotyped  using  the  Illumina 
550-Duo  SNP  array,  bringing  the  total  number  of  cases  to  1,718.  Population 
allele  and  genotype  frequencies  on  healthy  population  controls  ( n  =  222) 
genotyped  on  Illumina  HumanHap  370CNV  in  the  NordicDB,  a  Nordic  pool  and 
portal  for  genome-wide  control  data,  were  obtained  from  the  Finnish  Genome 
Center.  GWAS  data  for  public  controls  (n  =  3,448)  were  generated  using  the  follow¬ 
ing  arrays:  Illumina  660-Quad  (QIMR),  Illumina  550(vl)  (CGEMS),  Illumina  550 
(KORA)  and  Illumina  1.2M  (WTCCC).  The  combined  total  number  of  controls 
was  3,670.  These  GWAS  data  were  independently  evaluated  by  an  iterative  qual¬ 
ity  control  process  with  the  following  exclusion  criteria:  MAF  <0.01,  call  rate 
<95%,  Hardy- Weinberg  equilibrium  (HWE)  Rvalue  <  1  x  10-7  among  controls 
and  sample  call  rate  <98%.  In  total,  we  excluded  cases  failing  in  the  genotyping 
process  ( n  =  5),  previously  unknown  replicates  (n  =  2)  and  samples  with  call  rates 
<98%  ( n  =  83),  samples  that  failed  sex  check  ( n  =  10),  cases  identified  as  non- 
triple-negative  breast  cancer  ( n  =  20)  and  related  samples  ( n  =  27).  We  removed 
SNPs  with  <95%  call  rates  or  MAF  <5%.  Because  a  number  of  our  samples  were 
genotyped  at  different  locations,  we  removed  SNPs  if  there  was  a  difference  of 
>0.10  between  the  study  allele  frequency  and  the  median  frequency  across  all 
studies.  Eigensoft  was  used  to  evaluate  confounding  due  to  population  stratifica¬ 
tion.  We  removed  101  subjects  that  did  not  cluster  with  the  CEU  HapMap  phase 
2  samples,  resulting  in  1,562  cases  and  3,578  controls  in  the  GWAS  analyses. 


Re-genotyping  of  rsl0069690  on  2,963  TNBCC  cases  and  1,632  study- 
specific  controls  was  conducted  using  a  single  multiplex  on  the  iPLEX  Mass 
Array  platform  (Sequenom).  We  removed  31  cases  from  MCCS  that  were 
part  of  the  MCCS  replication  sample  in  BPC3.  SNPs  and  samples  evaluated 
on  the  iPLEX  were  excluded  on  the  basis  of  the  following  criteria:  SNP  call 
rate  was  <97%,  HWE  P  value  <  0.001  among  controls  and  sample  call  rate 
<95%  (for  the  overall  experiment).  The  final  data  set  of  2,849  cases  and  1,602 
controls  for  rs  10069690  had  a  SNP  call  rate  >99%  and  HWE  P  value  of  0.53 
in  controls.  The  concordance  rate,  on  the  basis  of  blinded  duplicates,  was 
100%.  The  concordance  of  the  imputed  ( R 2  =  0.55)  versus  the  genotyped 
data  was  70%. 

Replication  genotyping.  In  BPC3,  genotyping  of  rs  10069690  was  per¬ 
formed  by  TaqMan  in  five  laboratories  (Cancer  Prevention  Study  II 
Nutrition  Cohort  (CPS2)  and  Multiethnic  Cohort  (MEC)  at  the  University 
of  Southern  California;  the  Nurses’  Health  Study  (NHS)  and  the  Women’s 
Health  Study  (WHS)  at  Harvard  University;  EPIC  at  the  German  Cancer 
Research  Center  in  Heidelberg;  MCCS  at  Melbourne  University  and  PLCO 
at  the  NCI  Core  Genotyping  Facility).  Genotyping  in  SEARCH  was  per¬ 
formed  by  TaqMan  at  Cambridge  University.  Genotype  call  rates  were  >92% 
in  cases  and  controls,  and  concordance  of  blinded  duplicates  was  >99.5%  in 
all  studies.  The  P  value  for  HWE  in  controls  was  >0.01  in  all  studies  except 
WHS  (P=  0.007). 

DNA  sequencing.  Bi-directional  sequencing  of  the  15  coding  exons  of  TERT 
was  performed  in  96  African-American  women  using  the  ABI  3730x1  DNA 
Analyzer  (Applied  Biosystems).  Sequencing  purification  was  performed 
using  DyeDX  96  columns  (Qiagen)  following  their  standard  protocol,  and 
PolyPhred  was  used  for  analyzing  sequence  traces  (http://droog.gs.washington. 
edu/polyphred/).  More  than  95%  of  samples  were  sequenced  for  each  exon 
except  for  exon  15  ( n  =  74)  and  16  ( n  =  86).  Exon  1  could  not  be  sequenced, 
as  well  as  112bp  (9%)  of  exon  2,  because  of  high  GC  content. 

Statistical  analysis.  In  AABC,  we  tested  for  gene  dosage  effects  through  a 
one-degree-of-freedom  likelihood  ratio  test  in  models  adjusted  for  age,  study 
and  genetic  ancestry  eigenvectors  1-10.  OR  and  95%  Cl  were  estimated  using 
unconditional  logistic  regression.  In  TNBCC,  unconditional  logistic  regression 
was  used  to  assess  single  SNP  associations  also  assuming  a  log-additive  model, 
adjusting  for  country  and  the  first  two  principal  components.  For  the  analyses 
of  the  iPLEX  genotyping  data  on  rs  10069690,  unconditional  logistic  regression 
was  used  assuming  a  log-additive  model  and  adjusting  for  age  and  country. 

In  both  AABC  and  TNBCC,  phased  haplotype  data  from  the  founders  of  the 
CEU  and  YRI  HapMap  Phase  2  samples  (build  21)  were  used  to  infer  linkage 
disequilibrium  patterns  in  order  to  impute  untyped  markers.  For  both  studies, 
genome-wide  imputation  was  carried  out  using  the  software  MACH.  Filtered 
from  the  analysis  were  SNPs  with  R2  <  0.3. 

We  conducted  a  fixed-effect  meta-analysis  of  AABC  and  TNBCC  using  the 
inverse  variance  weighted  method.  The  number  of  SNPs  available  for  meta¬ 
analysis  from  AABC  and  TNBCC  was  3,055,415  and  2,134,490  respectively. 
The  union  of  these  two  data  sets  (3,154,485  SNPs)  was  meta- analyzed  using 
the  program  METAL. 

SNP  rs  10069690  was  analyzed  in  BPC3  and  SEARCH  using  logistic  regres¬ 
sion  controlling  for  age  and  study  or  country  (BPC3  only).  The  meta-analysis 
of  rsl0069690  from  AABC,  TNBCC,  BPC3  and  SEARCH  was  conducted  using 
the  inverse  variance  weighted  method.  Testing  for  heterogeneity  by  study  was 
evaluated  using  the  Q  statistic.  Case-only  analyses  were  performed  to  test  for 
differences  in  the  association  by  tumor  subtypes. 

We  estimated  the  relative  risk  in  African-ancestry  women  compared  to 
women  of  European  descent  that  could  plausibly  be  attributable  to  the  associa¬ 
tion  with  rs  10069690.  The  calculation  of  the  attributable  racial/ethnic  ratio 
(ARR)  is  ARR  =  /EOR*',  where /A(i)  is  the  probability  in 

the  African  American  women  of  carrying  *  =  0,  1  or  2  copies  of  the  risk  vari¬ 
ant  and/E(z)  is  the  same  probability  for  European  women.  The  per-allele  OR 
is  for  triple-negative  disease  from  the  meta-analysis  (1.25),  and  both  a  log 
linear  model  for  risk  and  Hardy- Weinberg  equilibrium  for  the  alleles  (in  both 
populations)  is  assumed.  A  confidence  interval  for  the  ARR  is  calculated  from 
the  confidence  interval  for  the  OR  in  the  meta-analysis. 
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Genome-wide  association  studies  (GWAS)  have  revealed  19  common  genetic  variants  that  are  associated  with 
breast  cancer  risk.  Testing  of  the  index  signals  found  through  GWAS  and  fine-mapping  of  each  locus  in  diverse 
populations  will  be  necessary  for  characterizing  the  role  of  these  risk  regions  in  contributing  to  inherited 
susceptibility.  In  this  large  study  of  breast  cancer  in  African-American  women  (3016  cases  and  2745  controls), 
we  tested  the  19  known  risk  variants  identified  by  GWAS  and  replicated  associations  ( P<  0.05)  with  only  4 
variants.  Through  fine-mapping,  we  identified  markers  in  four  regions  that  better  capture  the  association 
with  breast  cancer  risk  in  African  Americans  as  defined  by  the  index  signal  (2q35,  5q11, 10q26  and  19p13). 
We  also  identified  statistically  significant  associations  with  markers  in  four  separate  regions  (8q24, 10q22, 
1 1  ql  3  and  16q12)  that  are  independent  of  the  index  signals  and  may  represent  putative  novel  risk  variants. 
In  aggregate,  the  more  informative  markers  found  in  the  study  enhance  the  association  of  these  risk  regions 
with  breast  cancer  in  African  Americans  [per  allele  odds  ratio  (OR)  =  1.18,  P  =  2.8  x  10-24  versus  OR  = 
1 .04,  P  =  6.1  x  1 0  5].  In  this  detailed  analysis  of  the  known  breast  cancer  risk  loci,  we  have  validated  and 
improved  upon  markers  of  risk  that  better  characterize  their  association  with  breast  cancer  in  women  of 
African  ancestry. 
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INTRODUCTION 

Genome-wide  association  studies  (GWAS)  of  breast  cancer 
have  identified  at  least  19  chromosomal  regions  that  harbor 
common  alleles  that  contribute  to  genetic  susceptibility 
(1-10).  These  discoveries  have  allowed  for  improved  under¬ 
standing  of  genetic  risk  for  this  common  cancer,  although  it 
is  argued  that  many  more  markers  will  be  needed  to  elucidate 
disease  heritability,  and  in  the  clinical  setting  for  disease  pre¬ 
diction  (11-13).  Except  for  the  breast  cancer  risk  locus  at 
6q25  identified  in  a  GWAS  of  Chinese  women,  the  risk  loci 
for  breast  cancer  have  been  revealed  in  studies  in  women  of 
European  ancestry.  We  have  recently  shown  in  a  multiethnic 
study  that  a  summary  score  comprised  of  the  index  variants 
at  many  of  these  risk  loci  is  statistically  significantly  asso¬ 
ciated  with  breast  cancer  risk  in  multiple  populations  [odds 
ratio  (OR)  per  allele  of  >  1.10],  but  not  in  African  Americans 
(14).  Similar  studies  in  African-American  women  have  also 
reported  lack  of  replication  with  many  of  the  reported  index 
signals  (15-17).  Limited  statistical  power  of  these  initial 
reports  as  well  as  variation  in  both  allele  frequency  and  pat¬ 
terns  of  linkage  disequilibrium  (LD)  across  populations  may 
be  contributing  factors  as  to  why  the  associations  found  in 
the  GWAS  populations  may  not  be  generalizable  to  African 
Americans.  Association  testing  of  the  risk  variants  as  well  as 
fine-mapping  in  a  sufficiently  large  sample  of  African  Amer¬ 
icans  will  be  needed  to  identify  and  localize  the  subset  of 
markers  that  best  define  risk  of  the  functional  allele(s) 
within  known  risk  regions. 

In  the  present  study,  we  tested  common  genetic  variation  at 
the  breast  cancer  risk  loci  identified  in  women  of  European 
and  Asian  descent  in  a  large  sample  comprised  of  3016 
African-American  breast  cancer  cases  and  2745  controls  to 
identify  markers  of  risk  that  are  relevant  to  this  population. 
More  specifically,  we  examined  the  index  variants  and  con¬ 
ducted  fine -mapping  of  the  locus  to  both  improve  the  current 
set  of  risk  markers  in  African  Americans  as  well  as  to  identify 
new  risk  variants  for  breast  cancer.  We  then  applied  this  infor¬ 
mation  to  model  breast  cancer  risk  in  African-American 
women  in  an  attempt  to  characterize  the  spectrum  of  genetic 
risk  in  this  population  defined  by  common  variants  at  the 
known  risk  loci. 


RESULTS 

The  ages  of  cases  and  controls  ranged  from  22  to  87  years  and 
23  to  86  years,  respectively,  with  cases  and  controls  having 
similar  mean  ages  (55  and  58  years,  respectively;  Supplemen¬ 
tary  Material,  Table  SI). 

We  tested  19  validated  breast  cancer  risk  variants  (referred 
to  as  ‘index  variants’  throughout  the  paper)  at  lpll,  2q35, 
3p24,  5pl2,  5ql  1,  6q25,  8q24,  9p21,  9q31,  10pl5,  10q21, 
10q22,  10q26,  llpl5,  llql3,  14q24,  16ql2,  17q23  and 
19pl3  in  models  adjusted  for  age,  study,  global  ancestry  (the 
first  10  eigenvectors)  and  local  ancestry  (Table  1;  Supplemen¬ 
tary  Material,  Table  S2)  (1-10);  17  SNPs  were  directly  geno- 
typed,  whereas  2  were  imputed  ( r2>  0.98;  see  Materials  and 
Methods).  All  19  variants  were  common  (>0.05)  in  African 
Americans,  with  1 1  variants  being  more  common  in 
Europeans  than  in  African  Americans  (Table  1,  Fig.  1).  In 


previous  GWAS,  the  index  signals  had  modest  ORs  (1.05- 
1 .29  per  copy  of  the  risk  allele)  and  our  sample  size  provided 
>70%  statistical  power  to  detect  the  reported  effects  for  12  of 
the  19  variants  (at  P  <  0.05;  Supplementary  Material, 
Table  S2). 

We  observed  positive  associations  with  1 1  of  the  19  variants 
(OR  >  1);  however,  only  4  were  statistically  significant 
(P<  0.05  at  2q35,  9q31,  10q26  and  19pl3;  Table  1).  Of  the 
15  variants  that  were  not  replicated  at  P  <  0.05,  statistical 
power  was  <70%  for  only  7  of  the  variants.  Although 
power  was  more  limited,  we  also  evaluated  associations  by 
estrogen  receptor  (ER)  status  as  some  risk  variants  have 
been  found  to  be  more  strongly  associated  with  ER-positive 
(ER+)  or  ER-negative  (ER  — )  breast  cancer  (2,18).  We 
observed  positive  associations  with  12  variants  (2  at 
P  <  0.05)  for  ER+  disease  (w  =  1520)  and  with  9  variants 
for  ER—  (3  at  P  <  0.05;  n  =  988)  (Supplementary  Material, 
Table  S3).  For  only  one  variant  did  we  observe  statistically 
significant  risk  heterogeneity  by  ER  status  (rsl3387042  at 
2q35,  P  =  0.013)  (Supplementary  Material,  Table  S3). 

Local  ancestry  was  included  in  all  models,  as  it  was  found 
to  be  associated  with  breast  cancer  risk  in  many  regions  (Sup¬ 
plementary  Material,  Table  S4).  We  observed  nominally  sig¬ 
nificant  associations  between  local  ancestry  and  overall 
breast  cancer,  ER+  or  ER—  disease  risk  at  5  loci  (5pl2, 
6q25,  8q24,  lOpl 5,  10q26).  The  most  statistically  significant 
association  was  between  European  ancestry  and  ER+  breast 
cancer  risk  at  6q25  (OR  per  European  allele  chromosome  = 
1.19,  P  =  6.2  x  10-3).  The  inverse  association  observed 
between  European  ancestry  and  ER+  disease  risk  at  10q26 
(OR  per  European  chromosome  =  0.85,  P  =  0.01 1)  is  consist¬ 
ent  with  previous  reports  of  over-representation  of  African 
ancestry  at  this  locus  in  many  of  these  same  cases  (19,20). 

Aside  from  statistical  power,  the  lack  of  a  statistically 
significant  association  with  an  index  variant  (OR  >  1  and 
P  <  0.05)  suggests  that  the  particular  variant  revealed  in  the 
GWAS  populations  may  not  be  adequately  correlated  with 
the  biologically  relevant  allele  in  African  Americans.  In  an 
attempt  to  identify  a  better  genetic  marker  of  risk  in  African 
Americans,  we  conducted  fine-mapping  across  all  risk 
regions,  using  genotyped  SNPs  on  the  Illumina  1M  array 
and  imputed  SNPs  to  Phase  2  HapMap  populations  (see 
Materials  and  Methods).  If  a  marker  associated  with  risk  in 
African  Americans  represents  the  same  signal  as  that  reported 
in  the  initial  GWAS,  then  it  should  be  correlated  to  some 
degree  with  the  index  signal  in  the  GWAS  population. 
Using  HapMap  data  for  the  populations  in  which  the  risk 
variant  was  identified  [Utah  residents  with  ancestry  from 
northern  and  western  Europe  (CEU),  or  Han  Chinese  in 
Beijing,  China  (CHB)],  we  catalogued  and  tested  all  SNPs 
that  were  correlated  (r2>  0.2)  with  the  index  signal  (within 
250  kb),  applying  an  aa  of  3.2  x  10~3  which  was  estimated 
to  be  0.05  divided  by  the  average  number  of  tags  needed  to 
capture  (r2>  0.8)  the  common  risk  alleles  correlated  with 
the  index  allele  in  each  region  in  the  Yoruba  HapMap  popula¬ 
tion  [in  Ibadan,  Nigeria  (YRI);  Supplementary  Material, 
Table  S5],  We  also  tested  for  novel  independent  associations, 
focusing  on  SNPs  that  were  uncorrelated  with  the  index  signal 
in  the  initial  GWAS  populations.  Here,  we  applied  a  Bonfer- 
roni  correction  for  defining  novel  associations  as  statistically 
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Table  1.  Associations  with  common  variants  at  known  breast  cancer  risk  regions  in  African  Americans 


Chr.,  nearest  genes 

Index  SNP  from  GWAS  (3016 
Marker,  position,  alleles 
(risk/reference) 

cases,  2745  controls) 

RAF  in  CEU/AA3,  OR  (95%  Cl), 

/"Vend 

Best  marker  in  African  Americans  (3016  cases,  2745  controls) 

Marker,  position,  alleles  RAF  in  CEU/AA3,  OR  (95%  Cl), 

(risk/reference)  Arend  from  stepwise  analysis 

r2  with  index  in 
CEU/YRIb 

ipn 

2q35 

rsl  1249433,  120982136,  G/A 
rsl3387042,  217614077,  A/G 

0.43/0.13,  1.01  (0.90-1.14),  0.84 
0.56/0.72,  1.12  (1.03-1.21),  7.5  x  10~3 

rsl3000023°,  217632639,  G/A 

0.82/0.83,  1.20  (1.09-1.33),  5.8  x  10~4 

0.35/0.53 

3p24,  NEK10 

5pl2,  MRPS30 

5ql  1,  MAP3K1 

rs4973768,  27391017,  T/C 
rs44 15084,  44698272,  T/C 
rs889312,  56067641,  C/A 

0.44/0.36,  1.04  (0.96-1.13),  0.32 
0.38/0.63,  1.02  (0.95-1.11),  0.54 
0.30/0.34,  1.07  (0.99-1.18),  0.084 

rsl6886165,  56058840,  G/T 

0.16/0.31,  1.15  (1.06-1.25),  6.5  x  10~4 

0.40/C0.01 

6q25,  C6orf97 

8q24 

9p21,  CDKN2B 

9q31 

1  Op  15,  ANKRD16 
10q21,  ZNF365 
10q22,  ZMIZl 

rs2046210c’d,  151990059,  A/G 
rsl3281615,  128424800,  G/A 
rsl011970,  22052134,  T/G 
rs865686,  109928199,  T/G 
rs2380205,  5926740,  C/T 
rsl 0995 190,  63948688,  G/A 
rs704010,  80511154,  T/C 

0.38/0.60,  1.00  (0.93-1.09),  0.88 
0.45/0.43,  1.05  (0.97-1.13),  0.20 
0.17/0.33,  1.05  (0.97-1.14),  0.24 
0.61/0.52,  1.08  (1.01-1.17),  0.034 
0.52/0.42,  0.98  (0.91-1.06),  0.60 
0.87/0.83,  0.97  (0.88-1.08),  0.57 
0.43/0.11,  0.99  (0.87-1.12),  0.83 

rsl2355688,  80725632,  T/C 

0.090/0.20,  1.24  (1.13-1.36),  6.8  x  10~6 

<0.01/<0.01 

10q26,  FGFR2 

rs2981582,  123342307,  A/G 

0.46/0.46,  1.11  (1.03-1.19),  8.6  x  10~3 

rs2981578°,  123330301,  C/T 

0.46/0.81,  1.24  (1.11-1.39),  1.7  x  10~4 

0.66/0.059 

lip  15,  LSP1 

1  lql3 

rs3817198,  1865582,  C/T 
rs614367,  69037945,  T/C 

0.33/0.17,  0.98  (0.88-1.08),  0.63 
0.18/0.13,  0.96  (0.86-1.07),  0.45 

rs609275c,  69112096,  C/T 

1.00/0.59,  1.20  (1.11-1.30),  1.0  x  10~5 

NA/C0.01 

14q24,  RAD51L1 
16ql2,  TNRC9 

rs999737,  68104435,  T/C 
rs3803662,  51143842,  A/G 

0.26/0.051,  0.98  (0.82-1.17),  0.80 
0.25/0.51,  0.99  (0.92-1.08),  0.85 

rs3 112572,  51157948,  A/G 

0.020/0.20,  1.18  (1.08-1.30),  3.9  x  10~4 

0.038/0.31 

17q23,  COX11 

19p  13,  ANKLE1 

rs6504950c,  50411470,  G/A 
rs2363956,  17255124,  T/G 

0.70/0.66,  1.05  (0.97-1.14),  0.19 
0.45/0.49,  1.14  (1.05-1.22),  8.0  x  10~4 

rs3745185,  17245267,  G/A 

0.52/0.75,  1.20  (1.10-1.32),  3.7  x  10~5 

0.57/0.19 

SNP  positions  are  based  on  NCBI  build  36. 

ORs  are  per  allele  odds  ratios  adjusted  for  age,  study,  the  first  10  eigenvectors  and  local  ancestry  at  each  risk  locus. 

/trend  values  are  based  on  test  of  trend  (1  d.f.). 

aRAF,  risk  allele  frequencies  in  the  original  GWAS  population  (HapMap  CEU,  or  CHB  for  rs2046210)  and  AA  (African  American)  controls  in  this  study.  Risk  allele  is  the  allele  associated  with  increased  risk 
in  previous  GWAS. 

bPairwise  correlations  (r2)  between  the  index  signal  and  the  best  marker  are  from  the  CEU  (CHB  for  rs2046210)  and  YRI  populations  in  the  1000  Genomes  Project  (March  2010  release). 
cImputed  SNPs. 

dIndex  signal  reported  in  Han  Chinese.  RAFs  based  on  HapMap  CHB  and  r 2  based  on  CHB  in  the  1000  Genomes  Project  (March  2010  release). 
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Figure  1.  RAFs  in  Europeans  and  African  Americans.  The  distribution  of  RAFs  for  the  19  index  SNPs  (from  Table  1)  in  FlapMap  CEU  (CF1B  for  rs2046210) 
and  African  Americans  (AA).  The  variants  are  sorted  based  on  the  RAF  in  the  GW  AS  population. 


significant  in  each  region,  with  ah  estimated  to  be  0.05  divided 
by  the  total  number  of  tags  needed  to  capture  (r2>  0.8)  all 
common  risk  alleles  in  the  19  regions  in  the  YRI  population 
(ab=  1-0  x  10-5;  similar  to  the  genome-wide-type  correction 
of  5  x  10-8,  which  accounts  for  the  number  of  tags  needed  to 
capture  all  common  alleles  in  the  genome;  Supplementary  Ma¬ 
terial,  Table  S5).  For  each  region,  stepwise  logistic  regression 
was  used  with  SNPs  kept  in  the  final  model  based  on  aa  or  ab 
(results  for  each  model  are  provided  in  Supplementary  Mater¬ 
ial,  Tables  S6  and  S7).  These  procedures  were  applied  to  all 
cases  and  controls  as  well  as  in  hypothesis-generating  analyses 
stratified  by  ER  status. 

At  nine  loci,  we  detected  variants  that  were  statistically  sig¬ 
nificantly  associated  with  breast  cancer  risk  in  African 
Americans.  These  regions  include  9q31,  where  the  sole 
marker  of  risk  was  the  index  signal  (rs865686:  OR  =  1.08, 
P=  0.034;  Table  1).  In  five  of  these  nine  regions,  the  index 
marker  itself  was  not  statistically  significantly  associated 
with  disease  risk.  Through  fine-mapping,  we  revealed 
markers  in  four  regions  that  were  more  significantly  associated 
with  risk  than  the  index  signal  (>  1  order  of  magnitude  change 
in  the  P- value)  and  are  likely  to  capture  the  same  signal  (2q35, 
5qll,  10q26  and  1 9p  1 3).  We  also  identified  markers  in  four 
regions  that  are  not  correlated  with  the  index  signal  in  the 
GWAS  populations  (8q24,  10q22,  1  lql 3  and  16ql2)  and 
may  represent  putative  novel  risk  variants,  with  one  being  spe¬ 
cific  for  ER+  disease  (8q24)  (Table  1,  Fig.  2  and  Supplemen¬ 
tary  Material,  Table  S8).  These  regions  are  discussed  in  what 
follows. 


Risk  variants  that  better  define  the  index  signal 
in  African  Americans 

2q35.  The  index  signal  at  2q35  was  statistically  significantly 
associated  with  risk  of  overall  breast  cancer  (rsl3387042: 
OR  =1.12,  P  =  7.5  x  10-3;  Table  1)  and  ER+  disease 
(OR  =1.22,  R=2.6xl0-4;  Supplementary  Material, 


Table  S3).  Flowever,  we  found  stronger  associations  with 
two  markers  that  are  each  modestly  correlated  with  the 
index  signal  in  CEU  and  YRI:  rsl3000023  with  overall 
breast  cancer  (OR=  1.20,  P=5.8  x  10~4)  and  rsl2998806 
with  ER+  disease  (OR=  1.39,  P=  3.3  x  10“6)  (Table  1 
and  Supplementary  Material,  Table  S8).  As  shown  in  Supple¬ 
mentary  Material,  Figure  SI,  the  signal  in  this  region  appeared 
limited  to  ER+  breast  cancer,  which  is  consistent  with  the 
initial  report  of  this  risk  locus  (2)  but  not  with  subsequent 
large-scale  replication  efforts  in  European  populations  (21). 

5qll.  We  found  a  positive  non-significant  association  with  the 
index  signal  at  5ql  1,  which  is  located  79  kb  centromeric  of  the 
MAP3K1  gene  (rs889312:  OR  =  1.07,  P  =  0.084;  Table  1). 
Fine-mapping  revealed  statistically  significant  associations 
with  markers,  rsl6886165  for  overall  breast  cancer  (OR  = 
1.15,  P  =  6.5  x  10-4)  and  rs832529  for  ER—  disease 
(OR  =  1.22,  P  =  1.3  x  10-3;  Table  1  and  Supplementary  Ma¬ 
terial,  Table  S8).  These  SNPs  show  greater  correlation  with 
the  index  signal  in  Europeans  (CEU,  r2=  0.40  and  0.46) 
than  in  Africans  (YRI,  r2<  0.01  and  r2=  0.09),  which 
suggests  that  they  may  be  better  markers  of  the  biologically 
functional  variant  in  African  Americans  (Table  1,  Fig.  2). 

1 0q26.  Both  the  index  signal,  rs2981582  (OR  =1.11, 
P  =  8.6  x  10-3;  Table  1)  and  rs2981578,  which  was  identified 
previously  through  fine-mapping  in  African  Americans  (which 
some  of  these  studies  contributed  to)  (22),  were  statistically 
significantly  associated  with  risk  (OR  =  1.24,  P  =  1.7  x 
10-4,  Table  1).  Variant  rs2981578  was  the  most  strongly  asso¬ 
ciated  marker  in  the  region  for  overall  breast  cancer  and  for 
ER+  disease,  which  is  consistent  with  previous  reports  of 
variation  in  this  region  being  more  strongly  associated  with 
ER+  breast  cancer  (Supplementary  Material,  Table  S8)  (18). 
In  fine-mapping  the  locus,  we  observed  a  suggestive  associ¬ 
ation  with  a  correlated  marker  and  ER—  disease  (rs2912774: 
OR  =  1.19,  P  =  2.1  x  10~3;  Supplementary  Material,  Table 
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Figure  2.  -Log  P  plots  for  common  alleles  at  eight  breast  cancer  risk  loci  in  African  Americans.  -Log  P-values  for  risk-associated  alleles  in  African  Americans 
from  logistic  regression  models  adjusted  for  age,  study,  global  ancestry  (the  first  10  eigenvectors)  and  local  ancestry.  P- values  are  for  overall  breast  cancer  risk 
except  for  8q24,  which  is  for  ER+  breast  cancer.  Pairwise  correlations  (r2)  in  the  HapMap  CEU  population  are  shown  in  relation  to  markers  identified  through 
fine-mapping  in  African  Americans  (diamond),  except  for  1 1  q  1 3,  where  r2  is  shown  in  HapMap  YRI  as  the  marker  is  monomorphic  in  CEU.  Squares  denote 
genotyped  SNPs;  circles,  imputed  SNPs.  Gray  squares  and  circles  denote  that  r2  cannot  be  estimated  (not  in  HapMap  or  monomorphic  in  CEU).  Red 
arrows  denote  markers  identified  in  African  Americans;  yellow  arrows,  GWAS  index  variants.  Each  panel  shows  a  -log  P  plot  for  common  alleles  for 
regions:  (A)  2q35;  (B)  5qll;  (C)  8q24;  (D)  10q22;  (E)10q26;  (F)  llql3;  (G)  16ql2;  (H)  19pl3.  The  plots  were  generated  using  LocusZoom  (55). 
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S8);  however,  the  association  was  also  noted  with  ER+ 
disease  (OR  =1.10,  P=  0.041;  Supplementary  Material, 
Table  S9)  and  is  likely  to  capture  the  same  signal  as 
rs2981578. 

19pl3.  1 9p  1 3  was  the  first  risk  locus  reported  to  harbor  a 
variant  that  may  be  specific  for  ER—  disease  (9).  In  African 
Americans,  the  index  variant  was  statistically  significantly 
associated  with  risk  of  overall  breast  cancer  (rs2363956: 
OR  =  1.14,  P=  8.0  x  10“4),  as  well  as  ER+  (OR=  1.12, 
P  =  0.016)  and  ER-  disease  (OR  =1.14,  P=  0.018; 
Table  1  and  Supplementary  Material,  Table  S3).  The  most  sig¬ 
nificant  association  in  the  region  for  overall  breast  cancer 
and  ER+  disease  was  with  rs3745185  (P  =  3.7  x  10-5  and 
P=  8.2  x  10-4,  respectively),  which  is  likely  to  capture  the 
same  functional  variant  ( r2=  0.57  in  CEU  and  0.19  in  YRI; 
Table  1  and  Supplementary  Material,  Table  S8).  The  most  sig¬ 
nificant  marker  for  ER—  breast  cancer  was  correlated  with 
both  rs2363956  and  rs3745185  (rsl  1668840:  OR  =1.25, 
P  =  5.1  x  10-5;  Supplementary  Material,  Tables  S8  and  S10). 

Novel  risk-associated  markers  at  breast  cancer 
susceptibility  loci 

8q24.  Given  the  importance  of  the  8q24  locus  in  cancer,  we 
conducted  association  testing  across  the  entire  cancer  risk 
region  (126.0-130.0  Mb)  (23-25).  The  index  signal 
(rsl3281615)  was  not  statistically  significantly  associated 
with  risk  in  African  Americans  (Table  1  and  Supplementary 
Material,  Table  S3),  nor  did  we  identify  significant  associa¬ 
tions  with  correlated  SNPs.  However,  we  did  detect  a  signifi¬ 
cant  association  with  rsl 6902056  and  ER+  breast  cancer  [risk 
allele  frequency  (RAF)  0.95;  P  =  6.7  x  10-6;  ER— :  P  = 
0.66;  Supplementary  Material,  Table  S8].  This  SNP  is 
located  78  kb  centromeric  of  the  index  variant  and  is  not  cor¬ 
related  with  the  index  variant  ( r2<  0.01  in  CEU  and  r2= 
0.027  in  YRI).  No  statistically  significant  associations  were 
observed  with  variants  found  previously  in  association  with 
cancers  of  the  bladder  and  ovary,  or  leukemia  (rs9642880: 
OR  =1.03,  P  =  0.58;  rsl0088218:  OR  =1.02,  P  =  0.62; 
rs2456449:  OR  =1.07,  P=  0.14)  (26-28).  Of  the  known 
risk  variants  for  prostate  cancer  (29-35),  we  found  a  single 
nominally  significant  ( P  <  0.05)  association  with  the  same 
risk  allele  of  rsl016343  (P  =  0.015)  which  is  located 
>260  kb  centromeric  of  the  breast  cancer  risk  region  and  is 
not  correlated  with  rsl3281615  or  rsl 6902056. 

10q22.  We  observed  no  association  with  the  index  signal  at 
10q22  (rs704010)  which  is  located  in  intron  1  of  the  gene 
ZMIZ1,  or  with  any  correlated  markers.  However,  we  did 
detect  strong  evidence  of  a  second  signal  located  215  kb  telo- 
meric  in  intron  12  of  the  gene  ZMIZ1  (rsl2355688:  OR  = 
1.24,  P  =  6.8  x  10-6).  As  is  shown  in  Table  1  and  Figure  2, 
this  putative  novel  risk  variant  is  not  correlated  with  the 
index  variant  in  the  CEU  or  YRI  populations  (r2<  0.01). 

Ilql3.  No  positive  association  was  noted  with  the  index 
variant  at  llql3.  However,  we  did  detect  evidence  of  a 
second  independent  signal  (rs609275:  OR  =  1.20,  P=  1.0  x 
10-5),  located  74  kb  telomeric,  and  53  kb  centromeric  of 


CCND1.  The  variant  is  monomorphic  and  uncorrelated  with 
the  index  signal  in  the  CEU  population;  and  r2  with  the 
index  signal  in  the  YRI  population  is  <0.01  (Table  1). 

16ql2.  As  in  previous  studies  of  African  Americans,  we  were 
not  able  to  replicate  the  association  signal  defined  by  the  index 
variant  rs3803662  (Table  1)  (15,16).  A  recent  study  of  African 
Americans  reported  a  suggestive  association  with  SNP 
rs3104746,  which  is  located  15  kb  telomeric  of  rs3803662 
(16).  This  SNP  has  a  minor  allele  frequency  (MAF)  of  0.04 
in  the  HapMap  CEU  population,  0.19  in  our  African-American 
controls,  and  is  modestly  correlated  with  rs3 803662  in 
Africans  (r2=  0.31  in  YRI),  but  not  in  Europeans  ( r2= 
0.038;  Supplementary  Material,  Table  S10).  Fine-mapping 
around  this  putative  signal  revealed  a  perfect  proxy  (r  —  1) 
for  rs3104746,  rs31 12572,  which  is  significantly  associated 
with  breast  cancer  risk  in  African  Americans  (OR  =  1.18, 
P=  3.9  x  10-4),  with  the  association  noted  to  be  stronger 
for  ER+  breast  cancer  (OR  =1.27,  P=3.1  x  10-5; 
Table  1  and  Supplementary  Material,  Table  S8). 

For  index  SNPs  found  to  be  nominally  associated  with 
breast  cancer  risk,  as  well  as  risk-associated  markers  identified 
through  fine-mapping,  we  also  tested  for  associations  by  geno¬ 
type.  Results  from  the  genotype-specific  model  were  consist¬ 
ent  with  log-additive  associations  (Supplementary  Material, 
Tables  S9  and  Sll).  Risk  variants  at  2q35  and  8q24  were 
also  found  to  have  significantly  stronger  associations  with 
ER+  breast  cancer  than  ER—  disease  (Supplementary  Mater¬ 
ial,  Table  S7),  which  is  consistent  with  previous  studies  (2,18). 

We  observed  no  statistically  significant  associations  with 
common  variation  at  10  risk  loci  on  lpll,  3p24,  5pl2,  6q25, 
9p21,  10pl5,  10q21,  1  lp  1 5,  14q24  and  17q23  (Supplementary 
Material,  Fig.  S2).  We  also  could  not  replicate  the  association 
with  the  recently  identified  SNP  rs9397435  at  6q25  that  was 
found  through  fine-mapping  in  European,  African  and  Asian 
population  samples  (17)  (P  =  0.26  for  overall  breast  cancer, 
P  =  0.71  for  ER+  and  P  =  0.36  for  ER—  tumor  subtypes). 
Neither  could  we  replicate  the  association  with  SNP 
rs4784227  at  1 6ql 2,  which  was  identified  by  a  recent  multi¬ 
stage  GWAS  in  women  of  Asian  ancestry  (36)  in  our 
African-American  sample  (P=0.51  overall,  P  =  0.35  and 
P  =  0.65  for  ER+  and  ER—  subtypes,  respectively). 

Risk  modeling 

We  next  estimated  the  cumulative  effect  of  all  breast  cancer 
risk  variants,  and  compared  a  summary  risk  score  comprised 
of  unweighted  counts  of  all  GW AS-reported  risk  variants 
with  a  risk  score  that  included  variants  we  identified  as 
being  associated  with  risk  in  African  Americans  (Table  2). 
Using  the  19  index  signals  from  GWAS  (see  Materials  and 
Methods),  the  risk  per  allele  was  1.04  [95%  confidence  inter¬ 
val  (Cl)  1.02-1.06;  P=  6.1  x  10~5],  and  individuals  in  the 
top  quintile  of  the  risk  allele  distribution  were  at  1.4-fold 
greater  risk  (P  =  7.4  x  10  5)  of  breast  cancer  compared 
with  those  in  the  lowest  quintile  (Table  2).  As  expected,  the 
risk  score  was  improved  when  utilizing  the  markers  that  we 
identified  at  the  known  risk  loci  as  being  more  relevant  to 
African  Americans  (eight  markers  for  overall  breast  cancer: 
2q35,  5ql  1,  9q31,  10q22,  10q26,  llql3,  16ql2  and  19pl3; 
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Table  2.  The  association  of  the  total  risk  score  with  breast  cancer  risk  in  African  Americans 


Index  markers  from 
GWAS  (19  markers) 

Risk-associated  best  markers  in  African  Americans3 
(8  markers) 

Mean  number  of  risk  alleles  in  controls  (range) 

Per  allele  OR  (95%  Cl) 

P trend 

15.7  (6-25) 

1.04  (1.02-1.06) 

6.1  x  10"5 

8.4  (3-14) 

1.18  (1.14-1.22) 

2.8  x  10-24 

First-degree  family 
history  negative13 

First-degree  family 
history  positive13 

Subjects,  n  cases/w  controls 

3016/2745 

3016/2745 

2387/2349 

554/303 

Risk  quintilesc 

Ql 

n  cases/??  controls 

536/549 

352/462 

281/387 

62/57 

OR  (95%CI) 

1.00  (ref.) 

1.00  (ref.) 

1.00  (ref.) 

1.58  (1.06-2.37) 

P-value 

— 

— 

— 

0.025 

Q2 

77  cases/77  controls 

722/742 

430/505 

344/437 

77/47 

OR  (95%  Cl) 

0.99  (0.84-1.16) 

1.17  (0.96-1.42) 

1.15  (0.93-1.43) 

2.18  (1.46-3.26) 

P-value 

0.88 

0.11 

0.18 

1.5  x  10~4 

Q3 

77  cases/77  controls 

435/382 

632/625 

503/549 

115/53 

OR  (95%CI) 

1.15  (0.96-1.39) 

1.37  (1.14-1.64) 

1.31  (1.07-1.60) 

3.14  (2.17-4.53) 

P-value 

0.14 

7.2  x  10"4 

8.0  x  10~3 

1.2  x  10~9 

Q4 

77  cases/77  controls 

753/669 

665/566 

517/476 

132/75 

OR  (95%CI) 

1.16  (0.98-1.36) 

1.56  (1.30-1.87) 

1.51  (1.24-1.86) 

2.52  (1.81-3.52) 

P-value 

0.080 

2.3  x  10~6 

6.2  x  10-5 

4.0  x  10~8 

Q5 

77  cases/77  controls 

570/403 

937/587 

742/500 

168/71 

OR  (95%CI) 

1.44  (1.20-1.72) 

2.16  (1.80-2.58) 

2.11  (1.73-2.56) 

3.44  (2.47-4.77) 

P-value 

7.4  x  10~5 

3.6  x  10-17 

1.3  x  10-13 

9.9  x  10~14 

ORs  are  adjusted  for  age,  study  and  the  first  10  eigenvectors. 

Ptrend  values  are  based  on  test  of  trend  (1  d.f.). 

aThe  most  significant  markers  from  the  stepwise  analysis  for  overall  breast  cancer  in  each  region  from  Table  1. 
information  about  first-degree  family  history  of  breast  cancer  is  available  on  97.5%  of  cases  and  96.6%  of  controls. 

cBased  on  distribution  in  controls  (cut  points  for  index  markers  aggregate:  13.3,  15,  16,  18;  cut  points  for  best  markers  aggregate:  7,  8,  9,  10). 


OR  =  1.18;  95%  Cl  1.14-1.22;  P  =  2.8  x  10“24),  with  risk 
for  those  in  the  top  quartile  being  2.2  times  that  observed  in 
the  lowest  quintile  (P  =  3.6  x  10”17).  This  score  was  signifi¬ 
cantly  associated  with  risk  of  both  ER+  (OR  =  1 .20,  P  = 
1.7  x  10“19)  and  ER-  (OR  =1.15,  P  =  2.8xl0”9) 
disease  (/'het=  0.12)  (Supplementary  Material,  Table  S12). 

Stratifying  by  first-degree  family  history  of  breast  cancer 
differentiated  risk  further  with  those  with  a  family  history 
and  in  the  top  quintile  of  the  risk  score  distribution  (4%  of 
the  population)  having  a  3.4-fold  greater  risk  (P  =  9.9  x 
10” l4)  compared  with  those  without  a  family  history  and  in 
the  lowest  quintile  of  the  risk  score  (Table  2). 

In  hypothesis-generating  analyses,  we  also  developed  risk 
scores  for  ER+  and  ER—  breast  tumor  subtypes,  utilizing  the 
most  informative  markers  revealed  through  fine-mapping  of 
each  phenotype.  These  phenotype-specific  scores  were  highly 
significant  (ER+:  OR  =  1.30,  P  =  6.0  x  10” 18;  ER— :  OR  = 
1.20,  P  =  2.3  x  10”1(l)  with  statistically  significant  heterogen¬ 
eity  noted  when  the  scores  were  applied  to  the  other  subtype 
(Phet—  1.7  x  10” 5  and  5.0  x  10  3  for  ER-f  and  ER—  scores, 
respectively)  (Supplementary  Material,  Table  SI 2). 

DISCUSSION 

In  this  large  study  of  breast  cancer  in  African-American 
women,  we  were  able  to  replicate  associations  with  4  of  the 


19  index  variants  (at  P  <  0.05).  Through  fine-mapping,  we 
observed  that  overall  breast  cancer  risk  was  statistically  sig¬ 
nificantly  associated  with  markers  in  four  regions  which  are 
likely  to  capture  the  GWAS-reported  signal  and  to  serve  as 
better  markers  of  the  functional  allele  and  risk  in  African 
Americans.  We  also  detected  putative  novel  associations  that 
are  independent  of  the  index  signals  in  three  regions  for 
overall  breast  cancer  (10q22,  llql3  and  16ql2)  and  in  one 
region  for  ER+  disease  (8q24).  In  10  of  the  risk  regions, 
however,  we  were  not  able  to  replicate  the  GWAS  index 
signals,  nor  did  we  detect  statistically  significant  associations 
of  common  SNPs  with  breast  cancer  risk  at  the  levels  of  statis¬ 
tical  significance  we  set  for  fine-mapping.  The  inability  to  repli¬ 
cate  associations  with  the  index  signals  despite  adequate 
statistical  power  (>70%  power  for  12  of  19  variants)  suggests 
that  they  are  unlikely  to  be  functional  variants  or  capture  the 
functional  variants  as  efficiently  in  this  population.  Our  ability 
to  find  associated  markers  in  five  regions  where  index  signals 
were  not  significantly  associated  with  risk  also  demonstrates 
the  value  of  testing  common  variation  at  GWAS-identified 
risk  loci  in  additional  populations  (14,16,17,22,37,38). 

In  four  regions,  we  observed  risk  markers  that  are  correlated 
with,  and  in  the  same  LD  block  as  the  index  markers  in  CEU 
(rs  13000023  at  2q35,  rsl6886165  at  5ql  1,  rs2981578  at  10q26 
and  rs3745185  at  1 9p  13).  It  is  likely  that  these  risk  markers 
capture  the  same  signal  as  defined  by  the  index  markers 
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based  on  the  r2  values  between  these  markers  and  the  index 
markers  (>0.35).  We  cannot  rule  out  the  possibility,  though, 
that  some  of  them  may  represent  a  second,  independent 
signal  in  the  same  region. 

In  the  four  regions  where  we  observed  independent  signals, 
the  risk  alleles  (rsl6902056  at  8q24,  rsl2355688  at  10q22, 
rs609275  at  1 1  ql 3  and  rs31 12572  at  16ql2)  were  uncorrelated 
with,  and  not  in,  the  same  LD  block  as  the  index  variant  in 
Europeans  (CEU,  r2<  0.04))  (distances  from  the  index 
signal  ranged  from  14  kb  at  16ql2  to  215  kb  at  10q22)  (Sup¬ 
plementary  Material,  Fig.  S3).  Therefore,  these  variants  are 
likely  to  pick  up  a  novel  signal  independent  of  the  index 
signal.  However,  because  of  different  LD  patterns  in  European 
and  African  ancestry  populations,  they  may  each  mark  the 
same  functional  variant,  and  if  the  functional  variant  is  less 
common  it  may  not  be  well  captured  by  either  common 
marker  alone.  At  10q22,  both  the  index  SNP  and  the  novel 
variant  are  located  within  introns  of  the  ZMIZ1  gene.  ZMIZ1 
encodes  zinc  finger  MIZ-type  containing  1,  which  regulates 
the  activity  of  various  transcription  factors  (39-41).  At 
llql3,  rs609275  lies  74  kb  telomeric  of  the  index  signal  and 
in  closer  proximity  to  a  number  of  candidate  genes,  including 
CCND1  (encoding  cyclin  Dl,  a  protein  crucial  for  cell-cycle 
control),  ORAOV1  (encoding  oral  cancer  overexpressed  1) 
and  FGF19  (encoding  fibroblast  growth  factor  19).  The  asso¬ 
ciation  at  16ql2  confirms  the  findings  of  a  previous,  smaller 
study  of  African  Americans  (16),  and  is  consistent  with  a  pre¬ 
vious  fine-mapping  study  suggesting  that  African  Americans 
may  harbor  a  separate  causal  variant  in  this  region  (42). 
Whether  this  variant  is  influencing  the  same  genes/pathways 
as  the  index  variant  rs3803662  is  not  known;  however,  the 
stronger  associations  noted  for  both  variants  with  ER+ 
disease  (2, 1 8)  suggest  that  they  may  affect  the  same  biological 
process. 

Notably,  at  region  1 9p  1 3 ,  which  was  originally  reported  in 
association  with  ER—  breast  cancer  (9),  the  index  signal  was 
statistically  significantly  associated  with  both  ER+  and 
ER—  subtypes  in  African  Americans.  In  addition,  we 
found  a  stronger  marker  in  this  region  (rs3745185)  for 
ER+  as  well  as  overall  breast  cancer  risk  (Table  1  and  Sup¬ 
plementary  Material,  Table  S8).  We  also  found  stronger 
associations  with  ER+  than  ER—  disease  for  variants  in 
many  regions,  including  2q35,  8q24,  10q26  and  16ql2, 
which  is  consistent  with  previous  reports  (2,18).  In  the 
study,  we  also  found  strong  signals  for  ER—  disease  in 
regions  5qll,  10q26  and  19pl3.  It  is  possible  that  these 
signals  may  explain  some  of  the  excess  risk  for  ER  — 
disease  in  African  Americans,  since  these  risk  alleles  have 
higher  frequencies  in  this  population  than  they  do  in 
European-ancestry  populations.  However,  our  understanding 
of  their  contribution  to  racial  and  ethnic  differences  in 
disease  incidence  will  only  be  determined  once  the  functional 
variants  have  been  identified  and  tested  across  populations. 
Unfortunately,  we  were  not  able  to  assess  associations  with 
triple-negative  (ER/PR/HER2-negative;  PR,  progesterone 
receptor;  HER2,  human  epidermal  growth  factor  receptor  2) 
breast  cancer,  since  HER2  status  was  available  for  only  a 
limited  number  of  cases.  However,  in  a  large  study  of 
women  of  European  ancestry  which  tested  many  of  these 
same  index  variants,  further  stratification  on  tumor  subtype 


using  HER2  status  was  not  additionally  informative  for 
ER/PR-negative  breast  cancer  (43). 

The  observation  of  secondary  signals  at  many  loci,  and 
associations  of  variants  with  different  tumor  subtypes  that 
have  not  yet  been  reported  in  European-ancestry  populations 
could  indicate  a  different  genetic  architecture  of  breast 
cancer  across  populations.  For  example,  the  index  signal  at 
TNRC9  does  not  replicate  in  African  Americans,  but  there 
appears  to  be  a  second  risk  variant  that  is  unique  to  this  popu¬ 
lation.  At  FGFR2,  which  was  originally  reported  to  be  asso¬ 
ciated  with  ER+  disease  in  women  of  European  ancestry, 
we  found  a  signal  for  ER—  disease  with  a  marker  correlated 
with  the  index  variant.  Similarly,  for  chromosome  1 9p  13, 
which  was  reported  as  an  ER—  locus,  we  observed  an  associ¬ 
ation  with  ER+  breast  cancer.  However,  these  findings  and 
their  implications  require  further  validation. 

We  investigated  local  ancestry  as  a  potential  confounding 
factor  in  the  analysis  of  each  risk  locus.  At  five  loci,  we 
observed  nominally  significant  evidence  of  association 
between  local  ancestry  and  breast  cancer  risk,  with  the  most 
statistically  significant  association  observed  at  6q25  between 
European  ancestry  and  ER+  breast  cancer  risk.  Although 
the  association  of  local  ancestry  and  breast  cancer  risk  needs 
to  be  validated  in  additional  large  studies,  the  inability  to  iden¬ 
tify  a  risk  variant  that  is  differentiated  in  frequency  between 
populations  of  European  and  African  ancestry  implies  that 
either  the  association  with  local  ancestry  at  many  regions  is 
a  false-positive  signal  and/or  we  have  not  tested  an  adequate 
surrogate  of  the  functional  alleles. 

The  majority  of  the  variants  identified  by  GWAS  for 
common  cancers  are  of  low  risk  (relative  risks  <1.30)  and 
in  aggregate  are  not  yet  informative  for  risk  prediction 
(11-13).  Until  the  functional  alleles  at  each  susceptibility 
locus  are  identified  and  their  effects  are  accurately  estimated, 
modeling  of  the  genetic  risk  will  rely  on  markers  that  best 
capture  risk  for  a  given  population.  Many  of  the  markers  we 
identified  at  these  risk  loci  appear  to  have  stronger  associations 
with  breast  cancer  risk  compared  with  the  GW AS-identified 
variants  in  African-American  women.  The  risk  score  for 
overall  breast  cancer  was  also  equally  efficient  for  ER+  and 
ER—  tumors.  However,  our  hypothesis-generating  model 
suggests  that  identification  of  tumor  subtype-specific  variants 
will  improve  the  fit  of  these  models. 

While  this  is  the  largest  study  of  African  Americans  to  date 
to  investigate  genetic  risk  at  known  breast  cancer  susceptibil¬ 
ity  loci,  statistical  power  was  still  limited.  We  had  only  35% 
power  to  detect  an  OR  of  1.10  for  a  risk  allele  of  0.10  fre¬ 
quency  which  may  account  for  our  inability  to  replicate 
GWAS  signals  or  risk-associated  markers  in  10  of  the 
regions.  While  attempting  to  apply  a  strict  threshold  for 
declaring  significance  through  fine-mapping,  we  did  not  take 
into  account  testing  for  multiple  phenotypes  (overall  breast 
as  well  as  ER+  and  ER—  disease).  As  a  result,  the  a-levels 
used  as  selection  criteria  may  be  too  liberal.  However,  our 
risk  modeling  focused  on  the  variants  revealed  for  overall 
breast  cancer,  whereas  we  consider  the  associations  observed 
for  markers  identified  for  ER+  or  ER—  disease  and  used  in 
the  subtype-specific  risk  modeling  as  hypothesis-generating. 
Since  all  of  the  cases  and  controls  used  for  fine- 
mapping/discovery  were  also  included  in  the  risk  modeling, 
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the  risk  model  is  likely  to  over-estimate  the  level  of  associ¬ 
ation  due  to  winner’s  curse.  Instead  of  partitioning  the 
sample  into  test  and  validation  sets,  we  felt  it  was  necessary 
to  use  all  of  the  subjects  in  the  association  testing  of  known 
variants  and  in  fine-mapping  to  increase  the  statistical  power 
to  detect  associations  in  each  region.  Therefore,  other 
studies  with  reasonable  power  in  African  Americans  must  be 
performed  in  the  future  to  test  the  model  presented. 

In  summary,  through  fine-mapping  of  the  breast  cancer  sus¬ 
ceptibility  regions  in  a  large  sample  of  African-American 
women,  we  identified  markers  with  enhanced  association 
with  breast  cancer  in  this  population.  Validation  and  augmen¬ 
tation  of  this  model  are  needed  before  risk  modeling  based  on 
genetic  variants  of  low  risk  can  be  implemented  in  the  clinical 
setting. 

MATERIALS  AND  METHODS 

Ethics  statement 

The  Institutional  Review  Board  at  the  University  of  Southern 
California  approved  the  study  protocol. 

Study  populations 

This  study  included  9  epidemiological  studies  of  breast  cancer 
among  African-American  women,  which  comprise  a  total  of 
3153  cases  and  2831  controls.  Sample  size  and  selected  char¬ 
acteristics  for  these  studies  are  summarized  in  Supplementary 
Material,  Table  SI.  What  follows  is  a  brief  description  of  these 
studies. 

The  Multiethnic  Cohort  Study  (MEC).  The  MEC  is  a  prospect¬ 
ive  cohort  study  of  215  000  men  and  women  in  Hawaii  and 
Los  Angeles  (44)  between  the  ages  of  45  and  75  years  at  base¬ 
line  (1993-1996).  Through  31  December  2007,  a  nested 
breast  cancer  case-control  study  in  the  MEC  included 
556  African-American  cases  (544  invasive  and  12  in  situ) 
and  1003  African-American  controls.  An  additional  178 
African-American  breast  cancer  cases  (ages:  50—84)  diag¬ 
nosed  between  1  June  2006  and  31  December  2007  in 
Los  Angeles  County  (but  outside  of  the  MEC)  were  included 
in  the  study. 

The  Los  Angeles  component  of  The  Women ’s  Contraceptive 
and  Reproductive  Experiences  (CARE)  Study.  The  CARE 
Study  is  a  large  multi-center,  population-based  case-control 
study  that  was  designed  to  examine  the  effects  of  oral  contra¬ 
ceptive  use  on  invasive  breast  cancer  risk  among 
African-American  women  and  white  women  aged  35-64 
years  in  five  US  locations  (45).  Cases  in  Los  Angeles 
County  were  diagnosed  from  1  July  1994  through  30  April 
1998,  and  controls  were  sampled  by  random-digit  dialing 
(RDD)  from  the  same  population  and  time  period;  380 
African-American  cases  and  224  African-American  controls 
were  included  in  the  study. 

The  Women’s  Circle  of  Health  Study  (WCHS).  The  WCHS  is 
an  ongoing  case -control  study  of  breast  cancer  among 
European  women  and  African-American  women  in  the 


New  York  City  boroughs  and  in  seven  counties  in  New 
Jersey  (46).  Eligible  cases  included  women  with  invasive 
breast  cancer  between  20  and  74  years  of  age;  controls  were 
identified  through  RDD.  The  WCHS  contributed  272  invasive 
African-American  cases  and  240  African-American  controls. 

The  San  Francisco  Bay  Area  Breast  Cancer  Study  (SFBCS). 
The  SFBCS  is  a  population-based  case-control  study  of  inva¬ 
sive  breast  cancer  in  Hispanic,  African-American  and  non- 
Hispanic  white  women  conducted  between  1995  and  2003  in 
the  San  Francisco  Bay  Area  (47).  African-American  cases, 
aged  35-79  years,  were  diagnosed  between  1  April  1995 
and  30  April  1999,  with  controls  identified  through  RDD. 
Included  from  this  study  were  172  invasive  African-American 
cases  and  231  African-American  controls. 

The  Northern  California  Breast  Cancer  Family  Registiy 
(NC-BCFR).  The  NC-BCFR  is  a  population-based  family 
study  conducted  in  the  Greater  San  Francisco  Bay  Area,  and 
one  of  six  sites  of  the  Breast  Cancer  Family  Registry 
(BCFR)  (48).  African-American  breast  cancer  cases  in 
NC-BCFR  were  diagnosed  after  1  January  1995  and 
between  the  ages  of  18  and  64  years;  population  controls 
were  identified  through  RDD.  Genotyping  was  conducted 
for  440  invasive  African-American  cases  and  53 
African-American  controls. 

The  Carolina  Breast  Cancer  Study  (CBCS).  The  CBCS  is  a 
population-based  case-control  study  conducted  between 
1993  and  2001  in  24  counties  of  central  and  eastern  North 
Carolina  (49).  Cases  were  identified  by  rapid  case  ascertain¬ 
ment  system  in  cooperation  with  the  North  Carolina  Central 
Cancer  Registry,  and  controls  were  selected  from  the  North 
Carolina  Division  of  Motor  Vehicle  and  United  States 
Health  Care  Financing  Administration  beneficiary  lists.  Parti¬ 
cipants’  ages  ranged  from  20  to  74  years.  DNA  samples  were 
provided  from  656  African-American  cases  with  invasive 
breast  cancer  and  608  African-American  controls. 

The  Prostate,  Lung,  Colorectal,  and  Ovarian  Cancer  Screen¬ 
ing  Trial  (PLCO)  Cohort.  PLCO,  coordinated  by  the  US 
National  Cancer  Institute  (NCI)  in  10  US  centers,  enrolled 
approximately  155  000  men  and  women  aged  55-74  years 
during  1993-2001  in  a  randomized,  two-arm  trial  to  evaluate 
the  efficacy  of  screening  for  these  four  cancers  (50).  A  total  of 
64  African-American  invasive  breast  cancer  cases  and  133 
African-American  controls  contributed  to  this  study. 

The  Nashville  Breast  Health  Study  (NBHS).  The  NBHS  is  a 
population-based  case-control  study  of  incident  breast 
cancer  conducted  in  Tennessee  (15).  The  study  was  initiated 
in  2001  to  recruit  patients  with  invasive  breast  cancer  or 
ductal  carcinoma  in  situ,  and  controls,  recruited  through 
RDD  between  the  ages  of  25  and  75  years.  NBHS  contributed 
310  African-American  cases  (57  in  situ)  and  186 
African-American  controls. 

Wake  Forest  University  Breast  Cancer  Study  (WFBC). 
African-American  breast  cancer  cases  and  controls  in  WFBC 
were  recruited  at  Wake  Forest  University  Health  Sciences 
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from  November  1998  through  December  2008  (51).  Controls 
were  recruited  from  the  patient  population  receiving  routine 
mammography  at  the  Breast  Screening  and  Diagnostic 
Center.  Age  range  of  participants  was  30-86  years.  WFBC 
contributed  125  cases  (116  invasive  and  9  in  situ )  and  153 
controls  to  the  analysis. 

Genotyping  and  quality  control 

Genotyping  in  stage  1  was  conducted  using  the  lllumina 
HumanlM-Duo  BeadChip.  Of  the  5984  samples  from  these 
studies  (3153  cases  and  2831  controls),  we  attempted  genotyp¬ 
ing  of  5932,  removing  samples  ( n  =  52)  with  DNA  concentra¬ 
tions  <20  ng/pJ.  Following  genotyping,  we  removed  samples 
based  on  the  following  exclusion  criteria:  (i)  unknown  repli¬ 
cates  (>98.9%  genetically  identical)  that  we  were  able  to 
confirm  (only  one  of  each  duplicate  was  removed,  n  =  15); 
(ii)  unknown  replicates  that  we  were  not  able  to  confirm 
through  discussions  with  study  investigators  (pair  or  triplicate 
removed,  n  =  14);  (iii)  samples  with  call  rates  <95%  after  a 
second  attempt  ( n  =  100);  (iv)  samples  with  <5%  African 
ancestry  ( n  =  36)  (discussed  in  what  follows);  and  (v) 
samples  with  <15%  mean  heterozygosity  of  SNPs  on  the  X 
chromosome  and/or  similar  mean  allele  intensities  of  SNPs 
on  the  X  and  Y  chromosomes  ( n  =  6)  (these  are  likely  to  be 
males). 

In  the  analysis,  we  removed  SNPs  with  <95%  call  rates 
(n  =  21  732)  or  MAFs  <  1%  (n  =  80  193).  To  assess  genotyp¬ 
ing  reproducibility,  we  included  138  replicate  samples;  the 
average  concordance  rate  was  99.95%  (>99.93%  for  all 
pairs).  We  also  eliminated  SNPs  with  genotyping  concordance 
rates  <98%  based  on  the  replicates  (n  =  11  701).  The  final 
analysis  data  set  included  1  043  036  SNPs  genotyped  on 
3016  cases  (1520  ER+,  988  ER—  and  the  remaining  508 
cases  with  unknown  ER  status)  and  2745  controls,  with  an 
average  SNP  call  rate  of  99.7%  and  average  sample  call  rate 
of  99.8%. 

Statistical  analysis 

Ancestry  estimation.  We  used  principal  components  analysis 
(52)  to  estimate  global  ancestry  among  the  5761  individuals, 
using  2546  ancestry  informative  markers.  Eigenvector  1  was 
highly  correlated  (p  =  0.997,  P  <  1  x  10-16)  with  percentage 
of  European  ancestry,  estimated  in  FIAPMIX  (53),  and 
accounted  for  10.1%  of  the  variation  between  subjects;  subse¬ 
quent  eigenvectors  accounted  for  no  more  than  0.5%.  At  each 
locus  and  for  each  participant,  we  also  estimated  local  ancestry 
[i.e.  the  number  of  European  chromosomes  (continuous 
between  0  and  2)  carried  by  the  participant],  using  the 
HAPMIX  program  (53).  To  summarize  local  ancestry  at 
each  region,  for  each  individual  we  averaged  across  all  local 
ancestry  estimates  that  were  within  the  start  and  end  points 
of  the  region  (Supplementary  Material,  Table  S5).  To 
address  the  potential  for  confounding  by  genetic  ancestry, 
we  adjusted  for  both  global  and  local  ancestry  in  all  analyses. 

SNP  imputation.  In  order  to  generate  a  data  set  suitable  for 
fine-mapping,  we  carried  out  genome-wide  imputation  using 
the  software  MACF1  (54).  Phased  haplotype  data  from  the 


founders  of  the  CEU  and  YRI  FlapMap  Phase  2  samples 
were  used  to  infer  LD  patterns  in  order  to  impute  ungenotyped 
markers.  The  r 2  metric,  defined  as  the  observed  variance 
divided  by  the  expected  variance,  provides  a  measure  of  the 
quality  of  the  imputation  at  any  SNP,  and  was  used  as  a  thresh¬ 
old  in  determining  which  SNPs  to  filter  from  analysis 
(r2<  0.3).  Of  the  1  539  328  common  SNPs  (MAF  >  0.05)  in 
the  YRI  population  in  HapMap  Phase  2,  we  could  impute 
1  392  294  (90%)  with  r2>  0.8.  For  all  the  imputed  SNPs 
presented  in  Results  and  the  tables  reported  herein,  the 
average  r2  was  0.92  (estimated  in  MACF1). 

Association  testing.  For  each  typed  and  imputed  SNP,  ORs 
and  95%  CIs  were  estimated  using  unconditional  logistic 
regression  adjusting  for  age  at  diagnosis  (or  age  at  the 
reference  date  for  controls),  study,  the  first  10  eigenvalues 
and  local  ancestry.  For  each  SNP,  we  tested  for  allele 
dosage  effects  through  a  1  d.f.  Wald  x  trend  test. 

We  fine-mapped  each  risk  locus  using  the  combined 
genotyped  and  imputed  SNPs  in  search  of  (i)  an  SNP  that  is 
more  associated  with  risk  in  African  Americans  than  the 
index  signal;  and  (ii)  a  novel  signal  that  is  independent  of 
the  index  signal.  As  some  risk  loci  have  been  found  to  be 
more  strongly  associated  with  breast  cancer  subtypes,  we 
investigated  three  outcomes:  (i)  overall  breast  cancer,  (ii) 
ER+  breast  cancer,  and  (iii)  ER—  breast  cancer,  with  the 
latter  two  being  hypothesis-generating.  These  analyses 
included  SNPs  (genotyped  and  imputed)  spanning  250  kb 
upstream  and  250  kb  downstream  of  each  index  signal.  If 
the  index  signal  was  contained  within  an  LD  block  (based 
on  the  D'  statistic)  of  >250  kb,  then  the  region  was  extended 
to  include  the  entire  region  of  LD. 

Stepwise  regression  was  performed  by  region  to  select  the 
most  informative  risk  variants  as  discussed  in  what  follows, 
in  models  adjusted  for  age,  study,  global  ancestry  (the  first 
10  eigenvectors)  and  local  ancestry.  In  the  stepwise  regression, 
we  preserved  the  original  sample  size  by  using  the  mean  geno¬ 
type  of  typed  subjects  in  place  of  ‘no-calls’  for  SNPs  with 
<100%  genotyping  completion  rate. 

Within  each  known  risk  locus,  it  is  expected  that  markers 
that  are  associated  with  risk  in  African  Americans  will  be  cor¬ 
related  with  the  index  signal  reported  in  Europeans.  Thus,  we 
identified  and  tested  SNPs  that  are  correlated  (r2>  0.2)  with 
the  index  signals  in  the  GW  AS  populations  (HapMap  CEU 
or  CHB  for  6q25).  For  each  region,  we  determined  the 
number  of  tags  needed  to  capture  all  the  SNPs  correlated 
with  the  index  signal  in  the  YRI  population  (Phase  2 
HapMap).  The  average  number  of  tags  in  each  region  was 
then  used  as  the  correction  factor  for  Bonferroni  correction. 
An  a-level  of  0.05  divided  by  average  number  of  tags 
needed  in  each  region  was  applied  in  the  stepwise  regression 
process.  For  all  of  the  remaining  markers  that  were  not  corre¬ 
lated  with  the  index  signal  (in  Europeans),  we  applied  a  more 
stringent  a-level  for  defining  statistical  significance.  In  each 
risk  region,  we  determined  the  number  of  tag  SNPs  needed 
to  capture  all  common  alleles  (MAF  >  0.05,  with  r2>  0.8) 
in  the  YRI  HapMap  population.  The  total  number  of  tags 
across  the  19  regions  was  then  used  as  a  correction  factor, 
as  they  define  the  number  of  independent  tests  in  each 
region.  An  a  of  0.05  divided  by  the  number  of  tags  was 
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applied  to  assess  statistical  significance  for  any  putative  novel, 
independent  signal  in  each  region.  For  correlated  SNPs  that 
were  selected  to  be  better  markers,  we  also  assessed  phase 
to  ensure  that  the  new  risk  allele  is  on  the  same  haplotype 
as  the  GWAS-reported  risk  allele  in  the  FlapMap  CEU 
population. 

Risk  modeling.  We  modeled  the  cumulative  genetic  risk  of 
breast  cancer  using  the  risk  variants  reported  in  previous 
GWAS  (total  =  19).  We  compared  the  results  with  a  model 
of  the  SNPs  found  to  be  significantly  associated  with  risk  in 
African  Americans,  which  included  SNPs  identified  from  the 
stepwise  procedures  at  all  loci  for  overall  breast  cancer  risk 
(presented  in  Table  1).  More  specifically,  in  each  case  we 
summed  the  number  of  risk  alleles  for  each  individual  and 
estimated  the  OR  per  allele  for  this  aggregate-unweighted 
allele  count  variable  as  an  approximate  risk  score  appropriate 
for  unlinked  variants  with  independent  effects  of  approximate¬ 
ly  the  same  magnitude  for  each  allele.  We  then  applied  this 
risk  score  to  overall  breast  cancer  as  well  as  ER+/ 
ER—  breast  cancer  subtypes.  We  also  constructed  risk 
scores  based  on  risk  alleles  for  ER+  and  ER—  tumor  subtypes 
separately,  and,  as  hypothesis-generating,  applied  both  risk 
scores  to  overall  and  ER+/ER—  breast  cancer  subtypes. 

SUPPLEMENTARY  MATERIAL 

Supplementary  Material  is  available  at  HMG  online. 
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ABSTRACT 


Genome-wide  association  studies  (GWAS)  of  breast  cancer  defined  by  hormone  receptor  status  have 
revealed  loci  contributing  to  susceptibility  of  estrogen  receptor  (ER)-negative  subtypes.  To  identify 
additional  genetic  variants  for  ER-negative  breast  cancer  we  conducted  the  largest  meta-analysis  of  ER- 
negative  disease  to  date,  comprising  4,754  ER-negative  cases  and  31,663  controls  from  three  GWAS: 

NCI  Breast  and  Prostate  Cancer  Cohort  Consortium  (BPC3)  (2,188  ER-negative  cases;  25,519  controls  of 
European  ancestry),  Triple  Negative  Breast  Cancer  Consortium  (TNBCC)  (1,562  triple  negative  cases; 
3,399  controls  of  European  ancestry)  and  African  American  Breast  Cancer  Consortium  (AABC)  (1,004 
ER-negative  cases;  2,745  controls).  We  performed  in  silico  replication  of  86  SNPs  at  P  <lxl0"5  in  an 
additional  1 1,209  breast  cancer  cases  (946  with  ER-negative  disease)  and  16,057  controls  of  Japanese, 
Latino  and  European  ancestry.  We  identified  two  novel  loci  for  breast  cancer  at  20ql  1  and  6ql4.  SNP 
rs2284378  at  20ql  1  was  associated  with  ER-negative  breast  cancer  (combined  two  stage  OR=1.16; 

P=  l.lxlO"8)  but  showed  a  weaker  association  with  overall  breast  cancer  (OR=1.08,  P=1.3xl0'6)  based  on 
17,869  cases  and  43,745  controls  and  no  association  with  ER-positive  disease  (OR=1.01,  P= 0.67)  based 
on  9,965  cases  and  22,902  controls.  Similarly,  rsl7530068  at  6ql4  was  associated  with  breast  cancer 
(OR=1.12;  P=  l.lxlO'9),  and  with  both  ER-positive  (OR=1.09;  P=1.5xl0'5)  and  ER-negative  (OR=1.16, 
T’=2.5xl0'7)  disease.  We  also  confirmed  three  known  loci  associated  with  ER-negative  ( 1 9pl 3)  and  both 
ER-negative  and  ER-positive  breast  cancer  (6q25  and  12p  11).  Our  results  highlight  the  value  of  large- 
scale  collaborative  studies  to  identify  novel  breast  cancer  risk  loci. 


INTRODUCTION 


Breast  cancer  is  a  heterogeneous  disease  and  has  multiple  histological  and  molecular  subtypes,  likely  with 
distinct  etiologies.  Tumors  that  lack  expression  of  the  estrogen  receptor  (ER)  tend  to  have  more 
aggressive  disease,  higher  histological  grade,  and  lower  survival  rates  (1).  ER-negative  breast  cancer  is 
more  common  in  women  of  African  ancestry,  accounting  for  as  much  as  40%  of  cases  in  African 
American  women  compared  with  15-20%  in  women  of  European  ancestry.  The  etiologic  heterogeneity 
between  breast  cancer  subtypes  is  supported  by  different  associations  with  ER-positive  versus  ER- 
negative  disease  for  many  of  the  known  breast  cancer  risk  factors  (such  as  reproductive  factors  and 
BMI)(2).  Tumors  in  women  with  BRCA1  mutations  are  predominantly  ER-negative,  while  tumors  in 
BRCA2  mutation  carriers  are  predominantly  ER-positive(3).  Furthermore,  genome-wide  association 
studies  have  identified  multiple  common  genetic  variants  more  strongly  associated  with  ER-positive  than 
ER-negative  breast  cancer(4).  Through  collaborative  efforts,  we  recently  identified  risk  loci  on  5p  1 5  and 
1 9p  1 3  that  are  associated  specifically  with  ER-negative  and  triple  negative  (TN)  (ER-negative, 
progresterone  (PR)-negative  and  EIER2 -negative)  breast  cancer(5-7). 

In  order  to  identify  genetic  loci  associated  with  risk  of  ER-negative  breast  cancer,  we  conducted  a 
meta-analysis  of  three  GWAS  of  ER-negative  breast  cancer,  comprising  4,754  cases  and  31,663  controls 
with  further  replication  in  an  additional  1 1,209  cases  (946  with  ER-negative  disease)  and  16,057  controls. 


RESULTS 

The  meta-analysis  included  GWAS  of  ER-negative  breast  cancer  (4,754  ER-negative  cases  and  31,663 
controls)  from  the  NCI  Breast  and  Prostate  Cancer  Cohort  Consortium  (BPC3)  (2,188  ER-negative  cases 
and  25,519  controls  of  European  ancestry),  the  Triple  Negative  Breast  Cancer  Consortium  (TNBCC) 
(1,562  triple  negative  cases  and  3,399  controls  of  European  ancestry)  and  the  African  American  Breast 
Cancer  Consortium  (AABC)  (1,004  ER-negative  cases  and  2,745  controls).  (Figure  1,  Supplementary 
Table  1).  We  observed  little  evidence  of  over-inflation  in  the  test  statistics  (A,  <  1.04  for  each  study; 
A,=1.04  for  meta-analysis)  (Supplementary  Figure  1).  A  total  of  86  SNPs  were  associated  with  ER- 
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negative  breast  cancer  at  P  <  10"5  (Supplementary  Table  2).  An  in  silico  replication  of  the  86  SNPs  was 
conducted  using  GWAS  of  European  (BCAC  combined),  Latino  (MEC-LAT,  SFBCS/NC-BCFR)  and 
Japanese  (MEC-JPT)  ancestry  populations,  totaling  1 1,209  breast  cancer  cases  (946  with  ER-negative 
disease)  and  8,404  controls  (Stage  2)(Supplementary  Table  1). 

Combining  results  for  ER-negative  breast  cancer  from  stages  1  and  2,  variants  in  three  regions 
showed  genome-wide  significance  [20ql  l-rs2284378,  T  allele:  odds  ratio,  OR=1.16,  P  =  l.lxlO'8  (Table 
1);  19pl3-rs8 100241,  G  allele:  OR=1.14,  P=3.5xl0'8;  6q25-rs9383938,  T  allele:  OR=1.28,  P  =  2.37  x  10’ 
10  ].  Variants  at  6q25  have  previously  been  associated  with  breast  cancer  risk(8),  and  variants  at  the 
1 9p  1 3  locus  have  been  associated  with  ER-negative  and  TN  breast  cancer  risk(5,  7).  The  rs2284378 
variant  at  20ql  1  is  located  in  a  region  containing  RALY (RNA  binding  protein,  autoantigenic),  EIF2S2 
(eukaryotic  translation  initiation  factor  2,  subunit  2  beta)  and  ~  1 00k  b  upstream  of  ASIP  (agouti  signaling 
protein),  and  is  in  high  linkage  disequilibrium  (r2=0.96  and  D'=l)  with  rs491 1414,  which  has  been 
associated  with  melanoma  and  basal  cell  carcinoma(9)  (Supplementary  Figure  2).  The  T  allele  at 
rs2284378  was  associated  with  an  increased  ER-negative  breast  cancer  risk  (OR>l)  in  all  racial/cthnic 
populations,  except  Japanese  (OR=0.99)  (Table  1).  However  this  group  had  the  smallest  sample  size. 
Furthermore,  no  significant  evidence  of  heterogeneity  was  observed  by  race  (P=0.28)  or  study  (P=0.54) 
(Table  1,  Supplementary  Table  3).  When  the  study  was  extended  to  include  all  available  breast  cancer 
cases  (ER-positive  and  ER-negative)  and  controls  from  the  participating  GWAS,  rs2284378  showed  a 
weaker  association  with  overall  breast  cancer  (OR=1.08,  P=1.3xl0'6  based  on  17,868  cases  and  43,744 
controls;  Table  1)  and  no  evidence  for  association  with  ER-positive  disease  (OR=1.01,  P= 0.67  based  on 
9,965  cases  and  22,902  controls  (Supplementary  Table  5).  A  case-only  analysis  of  ER-negative  versus 
ER-positive  breast  cancer  indicated  a  highly  significant  difference  in  ORs  by  ER  status  (T’=1.3xl0  4, 
Supplementary  Table  5).  Furthermore,  rs2284378  appeared  more  strongly  associated  with  triple 
negative  (TN)  breast  cancer  (OR=1.16;  P=6.4xl0'3),  than  ER-negative,  PR-negative,  HER2 -positive 
breast  cancer  (OR=1.07,  P=0.41),  although  these  differences  were  not  statistically  significant  (case-only 
P=0.44)  (Supplementary  Table  5). 
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Next,  we  examined  the  associations  between  all  candidate  loci  from  stage  1  (n=86  SNPs)  and 
overall  breast  cancer  risk  using  all  available  breast  cancer  cases  and  controls  from  the  studies  in  stages  1 
and  2  (Figure  1).  We  identified  genome-wide  statistically  significant  associations  with  variants  at  6q25 
(rs9383938,  T  allele:  OR=1.20;  P=8. 7x1  O'14),  and  a  recently  reported  risk  locus  near  the  PTHLH gene  at 
12p  1 1  (rsl975930,  T  allele:  OR=1.22;  F’=1.4xl0'13)(10).  In  addition,  we  observed  genome  wide 
significant  associations  with  multiple  variants  in  a  gene-desert  located  at  6ql4.  Allele  C  of  rsl7530068  at 
6ql4  was  associated  with  increased  risk  for  overall  breast  cancer  risk  (OR=1.12;  R=l.lxl0'9)  (Table  2, 
Supplementary  Figure  3,  Supplementary  Table  4)  and  both  ER-positive  (OR=1.09;  R=1.5xl0'5) 
(Supplementary  Table  6)  and  ER-negative  (OR=1.16,  R=2.5xl0'7)  (Table  2)  breast  cancer.  We 
observed  no  evidence  of  risk  heterogeneity  for  rs  17530068  by  ER  status  (case-only  analysis  P=0.53) 
(Supplementary  Table  6);  study  (/Jhct=0. 16);  or  race/ethnicity  (Phet  =0.30)  (Table  2).  Furthermore, 
rs  17530068  appeared  more  strongly  associated  with  ER-negative,  PR-negative,  HER2 -positive  breast 
cancer  (OR=1.26,  /J=8.0x  1 0"’),  than  TN  breast  cancer  (OR=1.12,  P=0.07),  although  these  differences 
were  not  statistically  significant  (case-only  P=0. 17)  (Supplementary  Table  6). 

We  also  evaluated  associations  for  25  known  breast  cancer  risk  markers  in  European-ancestry 
women  from  our  study  (Supplementary  Table  7  and  Supplementary  Figure  4).  In  our  samples  8  of  the 
1 3  markers  previously  associated  with  both  ER-negative  and  ER-positive  disease  or  with  ER-negative 
disease  only  (TERT  and  1 9p  13.1),  were  nominally  significantly  associated  (P<0.05)  with  ER-negative 
disease.  In  contrast,  none  of  the  10  markers  previously  associated  with  ER-positive  disease  only  were 
associated  with  ER-negative  disease.  A  risk  score  formed  by  summing  the  risk  alleles  at  all  25  previously 
identified  loci  was  significantly  associated  with  ER-negative  disease  in  our  study  (OR=T.06  (1.04-1.07); 
P= 2.9  xlO"14).  Risk  scores  for  subsets  of  markers  associated  with  ER-negative  disease  only  (2  markers)  or 
both  ER-negative  and  ER-positive  disease  (11  markers)  were  also  significantly  associated  with  ER- 
negative  disease  (OR=1.22  (1.14-1.31),  P=1.0  xlO'8  and  OR=1.08  (1.05-1.10),  P= 9.5  xlO'12, 
respectively).  A  risk  score  for  the  subset  of  loci  previously  associated  with  ER-positive  disease  only  (10 
markers)  was  not  associated  with  risk  of  ER-negative  disease  (OR=1.02  (1.00-1.04),  R=0.08).  These  score 
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results  provide  some  confirmation  of  earlier  results  and  an  estimate  of  the  effects  of  previously-identified 
breast  cancer  risk  markers  on  risk  of  ER-negative  disease. 

DISCUSSION 

We  present  results  from  the  largest  meta-analysis  to  date  to  specifically  focus  on  ER-negative  disease.  We 
identify  two  novel  loci  for  breast  cancer:  20ql  1  associated  with  ER-negative  and  triple  negative,  but  not 
ER-positive  breast  cancer,  and  6ql4  associated  with  both  ER-positive  and  ER-negative  breast  cancer.  In 
addition,  we  confirm  three  known  regions  previously  associated  with  ER-negative  ( 1 9p  13)  or  ER- 
negative  and  ER-positive  breast  cancer  (6q25  and  12p  11).  Correction  for  genomic  control  results  in 
similar  but  attenuated  findings  for  20ql  l-rs2284378  (RGC=2.4xlO'8)  and  6ql4-rsl7530068  (PGC  =3.2x10' 
1 

The  novel  association  at  20ql  1  with  ER-negative  breast  cancer  spans  the  A  SIP,  RALY  and 
EIF2S2  genes.  Agouti  signaling  protein  (product  of  the  A  SIP  gene)  was  first  described  to  inhibit 
melanogenesis  in  human  melanocytes  in  1997(1 1).  ASIP  is  a  melanocortin  1  receptor  (MC1R)  ligand 
that  antagonises  the  function  of  the  transmembrane  receptor(12).  The  variants  we  identified  at  20ql  1  for 
breast  cancer  are  highly  correlated  with  variants  previously  associated  with  pigmentation  traits  as  well  as 
risk  of  both  cutaneous  melanoma  and  basal  cell  carcinoma(9),  suggesting  a  possible  biological  link 
between  these  cancers.  Further  studies  have  confirmed  the  importance  of  the  genetic  variation  spanning 
the  ASIP  locus,  where  a  variant  at  20ql  1  showed  the  strongest  association  with  pigmentation  and  was 
implicated  in  a  probable  linkage  disequilibrium  (LD)  with  variants  within  an  ASIP  regulatory  region(13). 
EIF2S2  encodes  eukaryotic  translation  initiation  factor  2,  subunit  2  beta,  which  is  involved  in  early  steps 
of  protein  synthesis  by  forming  a  ternary  complex  with  GTP  and  initiator  tRNA.  The  deletion  of  Eif2s2 
has  been  associated  with  suppression  of  testicular  germ  cell  tumor  incidence  and  recessive  lethality  in 
mice(14).  The  agouti-yellow  ( AV)  deletion  is  a  genetic  modifier  known  to  suppress  testicular  germ  cell 
tumor  susceptibility  in  mice  and  humans.  The  A  V  mutation  deletes  both  RALY  and  Eif2s2,  and  induces 
the  ectopic  expression  of  agouti,  all  of  which  are  potential  testicular  germ  cell  tumor-modifying  variations 
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(14).  Both  RALY and  EIF2S2  are  expressed  in  many  tissues  including  mammary  gland(15).  SNP 
rs2284378  was  not  consistently  associated  with  expression  of  EIF2S2,  RALY,  or  ASIP  in  lymphocytes 
(1 1),  adipocytes  or  skin  cells(16)although  there  was  marginal  evidence  for  association  between  rs2284378 
and  EIF2S2  expression  in  one  study  (16)(Supplementary  Table  8).  However,  several  SNPs  in  high 
linkage  disequilibrium  with  SNP  rs2284378  (r2>0.8)  within  a  1MB  region  were  significantly  associated 
with  expression  of  nearby  genes  EIF2S2  and  RALY.  Rs491 1379  (r2=0.96)  is  statistically  significantly 
associated  with  EIF2S2  expression  in  fibroblasts  (P= 3.6  xlO'4)  (17)and  SNPs  rs761238  and  rs761236 
(r2=0.85)  are  associated  with  RALY  expression  in  lymphocytes  (P=8.3xl0'4)(16).  An  additional  13  SNPs 
(r2>0.85)  have  been  associated  with  expression  of  RALY,  GGTL3,  DYNLRB1,  and  AK054906  in  liver 
cells,  monocytes  and  lymphoblastoid  cell  lines  (Supplementary  Table  9).  In  addition  to  expression, 
several  enhancer  as  well  as  promoter  regions  defined  by  overlapping  chromatin  marks  in  human 
mammary  epithelial  cells  were  found  at  20ql  1  (Supplemental  Figure  5).  SNPs  in  high  LD  with 
rs2284378  (r2>0.7),  such  as  rs491 1395,  rs491 1396  and  rsl007090,  are  located  in  the  promoter  region  of 
RALY.  SNPs  rs6142101,  rs6087557,  and  rs491 1408  (r2>0.7)  are  present  in  the  promoter  region  of 
EIF2S2,  and  rsl054534,  rsl555075,  rs2268086,  rs2268088,  rs491 1401,  rs2284388,  rs2284389  and 
rs932388  are  located  in  predicted  enhancer  regions  in  introns  of  RALY.  Thus,  variants  at  20ql  1  may 
influence  expression  of  multiple  genes  in  mammary  epithelial  cells,  as  has  been  seen  in  prostate  cancer 
(18). 

In  contrast,  rsl7530068  at  6ql4  is  located  in  a  gene  desert  with  no  evidence  of  an  open/active 
regulatory  region  in  human  mammary  epithelial  cells  (Supplementary  Figure  6).  The  closest  gene 
(~262kb),  family  with  sequence  similarity  46,  member  A  ( FAM46A/C6orf37 ),  encodes  a  protein  of 
unknown  function.  Five  SNPs  in  this  region  in  low  linkage  disequilibrium  with  SNP  rs  17530068 
(r2<0.02)  were  associated  with  expression  of  IBTK  in  lymphoblastoid  cell  lines  (Supplementary  Table 
10).  Additional  studies  of  both  of  these  novel  regions  will  be  necessary  to  identify  the  underlying 
biologically  relevant  variant/s. 
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SNP  rs  17530068  at  chromosome  6ql4  was  associated  with  overall  breast  cancer  risk  and  showed 
no  differential  association  depending  on  ER  status.  The  association  of  SNP  rs2284378  at  20ql  1,  however, 
was  stronger  for  ER-negative  than  ER-positive  breast  cancer.  This  finding  underscores  the  importance  of 
investigating  genetic  variants  for  specific  subtypes  of  breast  cancer,  as  this  locus  had  not  been  previously 
identified  in  the  many  GWAS  of  breast  cancer  to  date  that  did  not  focus  on  this  specific  breast  cancer 
subtype.  The  etiology  of  ER-negative  disease  is  largely  unknown.  Identifying  new  loci  associated  with 
ER-negative  and  TN  breast  cancer  will  continue  to  provide  insight  into  the  biological  mechanisms 
underlying  this  more  aggressive  form  of  breast  cancer,  and  could  result  in  improvements  in  risk  prediction 
and  treatment. 

MATERIALS  AND  METHODS 
Study  populations 

Stage  1  included  the  studies  of  the  NCI  Breast  and  Prostate  Cancer  Cohort  Consortium  (BPC3),  Triple 
Negative  Breast  Cancer  Consortium  (TNBCC)  and  African  American  Breast  Cancer  Consortium 
(AABC).  The  BPC3  study  includes  2,188  ER-negative  cases  and  25,519  controls,  AABC  includes  3,153 
cases  (1,004  ER-negative)  and  2,745  controls  from  9  studies  and  TNBCC  includes  1,562  cases  and  3,399 
controls  from  15  studies  (Supplementary  Table  1).  Replication  studies  include  886  cases  (84  ER- 
negative)  and  830  controls  from  a  GWAS  of  breast  cancer  in  Japanese  (MEC-JPT)  women  and  546  cases 
(1 12  ER-negative)  and  558  controls  from  a  GWAS  of  breast  cancer  in  Latino  (MEC-LAT)  women  in  the 
Multiethnic  Cohort  (MEC),  992  (188  ER-negative)  and  640  controls  from  the  San  Francisco  Bay  Area 
Breast  Cancer  Study  (SFBCS)  and  the  Northern  California  Breast  Cancer  Family  Registry  (NC-BCFR), 
and  8,785  (562  ER-negative)  and  14,029  controls  from  eight  combined  GWAS  of  breast  cancer  from 
BCAC.  All  participants  in  these  studies  have  provided  written  consent  for  the  research  and  approval  for 
the  study  was  obtained  from  the  ethical  review  board  from  all  local  institutions.  A  description  of  each 
participating  study  has  been  provided  in  supplementary  material. 
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Stage  1  genotyping  and  quality  control 

Genotyping  in  AABC  was  conducted  using  the  Illumina  HumanlM-Duo  BeadChip.  Of  the  5,984  samples 
in  the  AABC  Consortium  (3,153  cases  and  2,831  controls),  we  attempted  genotyping  of  5,932,  removing 
samples  (n=52)  with  DNA  concentrations  <20  ng/ul.  Following  genotyping,  we  removed  samples  based 
on  the  following  exclusion  criteria:  1)  unknown  replicates  (>98.9%  genetically  identical)  that  we  were 
able  to  confirm,  n=  15);  2)  unknown  replicates  pair  or  triplicate  removed,  n=14);  3)  samples  with  call  rates 
<95%  after  a  second  attempt  (n=100);  4)  samples  with  <  5%  African  ancestry  (n=36)  (discussed  below); 
and,  5)  samples  with  <15%  mean  heterozygosity  of  SNPs  in  the  X  chromosome  and/or  similar  mean 
allele  intensities  of  SNPs  on  the  X  and  Y  chromosomes  (n=6).  In  the  analysis,  we  removed  SNPs  with 
<95%  call  rates  (n=21,732)  or  minor  allele  frequencies  (MAFs)  <1%  (n=80,193).  The  concordance  rate 
for  blinded  duplicates  was  99.95%.  We  also  eliminated  SNPs  with  genotyping  concordance  rates  <98% 
based  on  the  replicates  (n=l  1,701).  The  final  analysis  dataset  included  1,043,036  SNPs  genotyped  on 
3,016  cases  (988  ER-negative,  1520  ER-positive,  and  the  remaining  508  cases  with  unknown  ER  status) 
and  2,745  controls,  with  an  average  SNP  call  rate  of  99.7%  and  average  sample  call  rate  of  99.8%. 

Genotyping  for  the  TNBCC  GWAS  was  conducted  on  1,718  cases  from  10  studies  (ABCTB, 
BBCC,  DFCI,  FCCC,  GENICA,  MARIE,  MCBCS,  MCCS,  POSH,  SBCS)  using  the  Illumina  660-Quad 
SNP  array.  In  addition,  a  subset  of  MARIE  cases  (n=52)  were  genotyped  using  the  Illumina  CNV370 
SNP  array.  HEBCS  cases  (n=85)  were  genotyped  using  the  Illumina  550  SNP  array  and  population  allele 
and  genotype  frequencies  on  healthy  population  controls  (n=222)  were  genotyped  on  Illumina  370  SNP 
array,  and  obtained  from  the  NordicDB,  a  Nordic  pool  and  portal  for  genome-wide  control  data(19)  from 
the  Finnish  Genome  Center.  GWAS  data  for  public  controls  (n=3,448)  were  generated  using  the 
following  arrays:  Illumina  660-Quad  SNP  array  (QIMR),  Illumina  550  SNP  array  (CGEMS),  Illumina 
550  SNP  array  (KORA),  and  Illumina  1.2M  (WTCCC).  These  GWAS  data  were  independently  evaluated 
by  an  iterative  QC  process  with  the  following  exclusion  criteria:  minor  allele  frequency  (MAF)  <0.01, 
call  rate  <95%,  HWE  p-value  <lxl0‘7  among  controls  and  sample  call  rate  <98%.  In  total,  we  excluded 
previously  unknown  replicates  (n=2)  and  samples  with  call  rates  <98%  (n=83),  samples  that  failed  sex 
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check  (n=10),  cases  identified  as  non-trip le  negative  breast  cancer  (n=20)  and  related  samples  (n=27). 

We  removed  SNPs  with  <95%  call  rates  or  MAF  <5%.  Because  a  number  of  our  samples  were 
genotyped  at  different  locations,  we  removed  SNPs  if  there  was  a  difference  >0. 10  between  the  study 
allele  frequency  and  the  median  frequency  across  all  studies.  Eigensoft  software  which  uses  principle 
component  analysis  (PCA)  was  used  to  evaluate  confounding  due  to  population  stratification.  We 
removed  101  subjects  that  did  not  cluster  with  the  CEU  ElapMap  Phase  2  samples,  and  a  further  179 
controls  were  removed  which  overlapped  with  CGEMS/NEIS  controls  in  BPC3,  resulting  in  1,562  cases 
and  3,399  controls  in  the  GWAS  analyses. 

BPC3  GWAS  genotyping  was  conducted  at  three  genotyping  centers  (NCI  Core  Genotyping 
Facility,  USA;  University  of  Southern  California,  USA;  and  Imperial  College  London,  UK).  Subjects 
from  CPSII,  EPIC,  MEC,  PLCO,  and  PBCS  were  genotyped  using  the  Illumina  Human  660k-Quad  SNP 
arrayflllumina,  Inc),  NHSI/NHSII  and  part  of  the  PLCO  study  were  genotyped  previously  using  the 
Illumina  Human  550  SNP  array  (Illumina,  Inc)  (20).  SNPs  were  filtered  and  removed  based  on  deviations 
from  Hardy- Weinberg  proportions  in  control  subjects  (p<10e-5),  autosomal  SNPs  with  MAF  of  less  than 
5%  and  completion  rate  less  than  95%.  Samples  were  excluded  based  on  genotyping  call  rates  less  than 
95%  (n=195),  samples  with  extreme  heterozygosity  were  excluded  from  the  analysis  (n=35),  sex 
discordance  (n=3),  unexpected  duplicates  and  relatedness  (n=6),  Subjects  with  evidence  of  significant 
non-European  ancestry  and  population  structure  were  also  excluded.  Non-European  ancestry  was 
assessed  utilizing  a  subset  of  unlinked,  population  informative  SNPs  (21).  Individuals  determined  to  have 
less  than  80%  European  ancestry  were  excluded  from  future  analyses  (n=16).  The  average  concordance 
rate  of  blinded  duplicates  was  99.95%.  In  order  to  resolve  a  more  detailed  population  substructure,  PCA 
was  conducted  using  struct.pca  module  of  GLU  (http://eode.google.eom/p/glu-geneties/).  PCA  was  only 
performed  in  subjects  with  over  80%  European  ancestry.  Furthermore,  958  controls  from  NHS  (CGEMS) 
were  removed  from  BPC3  analyses  due  to  overlap  between  TNBCC  and  BPC3  studies.  The  overall 
number  of  cases  and  controls  after  all  exclusions  which  contributed  to  the  stage  1  analysis  were  1,998 
cases  and  2,305  controls. 
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The  WHS  cohort  subjects  in  BPC3  were  previously  genotyped  using  the  Human-Hap300  Duo- 
plus  BeadChip  (22).  Among  the  final  23,294  individuals  of  verified  European  ancestry,  genotypes  for  a 
total  of  2,608,509  SNPs  were  imputed  from  the  experimental  genotypes  and  LD  relationships  implicit  in 
the  HapMap  r.  22  CEU  samples.  WHS  contributed  190  cases  and  23,214  control  subjects  to  stage  1. 

WHS  was  meta-analyzed  with  the  remaining  BPC3  studies  contributing  a  total  of  2,188  cases  and  25,519 
control  subjects  to  stage  1  analysis. 

SNP  rs2284378  and  rs  17530068  were  genotyped  in  all  stage  1  studies. 

Stage  2  genotyping  and  quality  control 

The  San  Francisco  Bay  Area  Breast  Cancer  Study  (SFBCS)(23)and  the  Northern  California  Breast  Cancer 
Family  Registry  (NC-BCFR)(24)  study  samples  were  genotyped  with  the  Affymetrix  6.0  array  according 
to  the  manufacturer’s  instructions  (https://www.affymetrix.com)  in  the  laboratory  of  Esteban  Gonzalez 
Burchard  at  UCSF.  A  total  of  15  cases  and  30  controls  were  excluded  from  the  SFBCS  and  NC-BCFR 
sample  set  that  had  a  genotyping  call  rate  <95%  or  showed  either  known  or  cryptic  relatedness.  The  final 
sample  included  in  the  analysis  was  992  cases  (188  ER-negative  cases)  and  640  controls.  Imputation  was 
conducted  with  the  program  BEAGLE,  with  all  unrelated  HapMap  Phase  II  samples  included  as 
referenc  es  (http  ://hapmap .  ncbi  .nlm.  nih.  go v) . 

GWAS  of  breast  cancer  in  Latino  (MEC-LAT)  and  Japanese  (MEC-JPT)  samples  from  the  MEC 
were  genotyped  with  the  Illumina  660W  array  at  USC.  For  MEC-LAT,  we  excluded  48  samples  from  the 
MEC  that  had  a  genotyping  call  rate  of  <95%  and  34  that  showed  either  known  or  cryptic  relatedness. 

The  final  MEC-LAT  sample  included  546  (1 12  ER-negative)  and  558  controls.  With  similar  exclusions, 
the  final  MEC-JPT  sample  included  886  (84  ER-negative)  and  830  controls. 

The  BCAC  combined  GWAS  includes  primary  genotype  data  from  eight  breast  cancer  GWAS  in 
populations  of  European  ancestry  (ABCFS,  BBCS, ,  GC-HBOC,  MARIE,  HEBCS,  SASBAC,  UK2, 
DFBBCS).  All  studies  were  genotyped  with  various  versions  of  Illumina  arrays,  except  GC-HBOC 
which  was  performed  with  the  Affymetrix  5.0  (cases)  and  6.0  (controls)  arrays.  Standard  QC  was 
performed  on  all  scans.  Specifically,  all  individuals  with  low  call  rate  (<95%),  extreme  high  or  low 
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heterozygosity  (P<10'5),  and  all  individuals  evaluated  to  be  of  non-European  ancestry  (>15%  non- 
European  component,  by  multidimensional  scaling  using  the  three  Hapmap2  populations  as  a  reference) 
were  excluded.  SNPs  with  call  rate  <95%;  call  rate  <99%  and  MAF<5%,  all  SNPs  with  MAF<1%,  and 
SNPs  with  genotype  frequencies  departing  from  Hardy-Weinberg  equilibrium  at  P<  1  O’6  in  controls  or 
.P<10"12  in  cases  were  also  excluded.  Data  were  imputed  for  -2.6M  SNPs  for  all  scans  using  Mach  vl.O 
with  HapMap  version  2  CEU  as  a  reference.  BBCS  and  UK2  used  the  same  control  data  (WTCCC2). 
These  studies  were  imputed  separately.  For  the  combined  analysis,  the  control  set  was  divided  randomly 
between  the  two  studies,  in  proportion  to  the  size  of  case  series,  to  provide  disjoint  strata.  Estimated  per- 
allele  ORs  and  standard  errors  were  generated  from  the  imputed  genotypes  using  Probabel  (25). 

SNP  rs2284378  and  rs  17530068  were  genotyped  in  all  stage  2  studies  except  SFBCS  and  NC- 
BCFR  where  they  were  imputed.  Both  SNPs  were  genotyped  by  TaqMan  in  483  samples  from  these 
studies  and  genotype  concordance  versus  imputed  genotypes  was  93.3%  for  rs2284378  and  94.9%  for 
rsl7530068. 


Taqman  gentoyping  in  BPC3  for  SNP  rs2284378  and  SNP  rsl7530068 

In  BPC3,  genotyping  of  SNP  rs2284378  and  rsl7530068  was  performed  for  all  available  breast  cancer 
cases  and  controls  by  TaqMan  in  four  laboratories  (CPS-II  and  MEC  at  the  University  of  Southern 
California;  NHS  and  WHS  at  Harvard  University;  EPIC  at  the  German  Cancer  Research  Center  in 
Heidelberg;  and  PLCO  at  the  NCI/Core  Genotyping  Facility).  All  studies  typed  SNP  rsl7530068; 
however  for  SNP  rs2284378,  PLCO  and  CPS-II  typed  a  proxy  SNP  rs605965 1  (r2  =1,  D’=l).  The 
concordance  for  the  Taqman  genotyping  data  with  that  generated  from  Illumina  for  stage  1  ER-negative 
cases  and  controls  was  0.997  for  rs  17530068  and  0.986  for  rs2284378  for  CPS2,  MEC,  NHS,  EPIC  and 
PLCO.  The  genotype  concordance  versus  imputed  for  WHS  was  95%  for  rs2284378  and  97%  for 
rs!7530068 


Statistical  analysis 
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In  AABC,  we  tested  for  gene  dosage  effects  in  models  adjusted  for  age,  study  and  eigenvectors  1-10. 
Odds  ratios  (OR)  and  95%  confidence  intervals  (95%  Cl)  were  estimated  using  unconditional  logistic 


regression.  In  TNBCC,  unconditional  logistic  regression  was  used  to  assess  single  SNP  associations  also 
assuming  a  log-additive  model,  adjusting  for  country  and  the  first  two  principal  components.  In  BPC3, 
unconditional  logistic  regression  model  was  used  to  assess  single  SNP  associations  adjusting  for  age 
categories  and  the  top  6  eigenvectors. 

In  both  AABC  and  TNBCC,  phased  haplotype  data  from  the  founders  of  the  CEU  and  YRI 
HapMap  Phase  2  samples  (build  21)  were  used  to  infer  LD  patterns  in  order  to  impute  untyped  markers. 
For  BPC3,  Hapmap  Phase  2  (release  21)  and  Hapmap  Phase  3  were  used  to  impute  untyped  markers.  For 
all  studies,  genome-wide  imputation  was  carried  out  using  the  software  MACH.  Filtered  from  the 
analysis  were  SNPs  with  Rsq<0.3  and  MAF  <1%. 

We  conducted  a  fixed  effect  meta-analysis  of  AABC,  TNBCC  and  BPC3  using  the  inverse 
variance  weighted  method.  The  number  of  SNPs  available  for  meta-analysis  from  AABC,  TNBCC  and 
BPC3  in  stage  1  were  3,055,415,  2,134,490  and  245,3207  respectively.  The  union  of  these  three  data  sets 
was  meta-analyzed  using  the  program  METAF.  We  conducted  in  silico  replication  of  86  SNP  with  p- 
values  <  10"5  in  stage  1  in  the  stage  2  studies,  and  a  meta-analysis  of  these  SNPs  from  stage  1  and  2  for 


both  ER-  negative  and  overall  breast  cancer.  P-values  from  our  top  two  loci  were  corrected  for  genomic 


illation  (Pgc)  using  the  lambda  value  from  the  overall  meta-analysis.  Testing  for  heterogeneity  by  study 


was  evaluated  using  the  Q-statistic.  Case-only  analyses  were  performed  to  test  for  differences  in  the 
association  by  tumor  subtypes,  study  and  race/ethnicity. 

The  association  between  risk  scores  of  25  previously-identified  breast  cancer  risk  alleles  and  risk 
of  breast  cancer  in  our  samples  was  calculated  using  meta-regression,  assuming  the  per-allele  odds  ratio 
was  constant  across  the  markers  analyzed.  This  is  equivalent  to  combining  the  summary  log  odds  ratio 
estimates  at  independent  loci  using  inverse-variance  weighted  meta-analysis.  The  overlap  between 
subjects  contributing  to  this  study  and  those  contributing  to  previous  studies  varied  from  marker  to 
marker  (e.g.  the  TNBCC  contributed  to  the  initial  report  on  rs8170  (5)  and  the  BPC3  and  TNBCC 
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contributed  to  the  initial  report  on  the  TERT  locus  (6).Thus,  the  results  could  be  overestimates  since  some 
of  the  studies  here  contributed  to  the  discovery  of  these  25  loci. 


Functional  analysis 

Expression  quantitative  trait  loci  (eQTL)  were  assessed  for  all  SNPs  in  the  chromosome  6  and  20  loci 
using  the  GTEX  database  (http  ://www.ncbi.nlm.nih.  gov/gtex/ GTEX2/gtex. cgi),  University  of  Chicago 
eQTL  Browser  (http://eqtl.uchicago.edu)  and  Genevar 
(http://www.sanger.ac.uk/resources/software/genevar/)  (26) 

In  an  attempt  to  identify  functionality  at  the  two  novel  breast  cancer  risk  loci,  we  used  the  open- 
source  R/Bioconductor  package  FunciSNP  version  0.99(27),  which  systematically  integrates  the  1,000 
Genomes  Project  SNP  data  (April  2012  data  release)  with  chromatin  features  of  interest.  For  each  of  the 
two  novel  breast  cancer  markers,  we  analyzed  all  SNPs  with  an  r2  value  >  0.5  with  each  index  SNP  in  the 
1,000  Genomes  Project  EUR  populations  in  a  1MB  window  around  each  index  variant.  We  assessed 
whether  these  SNPs  were  co-located  with  12  different  chromatin  features  generated  by  next-generation 
sequencing  technologies,  which  capture  open  chromatin  regions,  promoters,  and  enhancers  genome-wide 
in  human  mammary  epithelial  cells  (FIMEC)  as  well  as  known  DNasel  hypersensitive  locations,  FAIRE- 
seq  peaks,  and  CTCF  binding  sites  from  more  than  100  different  cell  types,  which  were  collected  in 
ENCODE  data(28).  We  utilized  the  UCSC  Genome  Browser  (http://genome.ucsc.edu/)  to  illustrate  the 
correlated  SNPs,  which  overlap  chromatin  features  as  well  as  chromatin  feature  tracks  (Supplemental 
Figures  5-6). 
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Figure  1 .  Multi-stage  study  design. 
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Table  1.  Association  of  SNP  rs2284378  (T/C)  at  chromosome  20ql  1  and  breast  cancer  risk  by  study  and  race/ethnicity 


Consortium/ 

Study 

Race/ 

Ethnicity 

Case/ 

control2 

RAF 

(T  allele)b 

OR 

(95%  CI)C 

P-valued 

P Het-study/ 

P  e 

1  Het-race 

Stage  1  ER-negative  cases  versus  controls 

BPC3 

European 

2,188/25,519 

0.31 

1.14  (1.05-1.24) 

0.0028 

TNBCC 

European 

1,478/3,345 

0.33 

1.18(  1.07-1.30) 

0.0010 

AABC 

African 

1,004/2,744 

0.16 

1.19  (1.03-1.37) 

0.020 

Stage  1 

4,670/31,608 

1.16  (1.09-1.23) 

6.5xl07 

0.85/0.76 

Stage  2  ER-negative  cases 

versus  controls 

BCAC  Combined  GWAS 

European 

562/6410 

0.35 

1.10  (0.96-1.25) 

0.17 

MEC-JPT 

Japanese 

84/830 

0.26 

0.99  (0.68-1.44) 

0.95 

MEC-LAT 

Latino 

112/553 

0.29 

1.27  (0.94-1.71) 

0.13 

SFBCS/NC-BCFR 

Latino 

188/611 

0.29 

1.45  (1.13-1.87) 

0.004 

Stage  2  (ER-negative) 

946/8,404 

1.16  (1.04-1.29) 

0.0048 

0.98/0.12 

Stage  1+2  (ER-negative) 

5,616/40,012 

1.16  (1.10-1.22) 

1.1x10  s 

0.54/0.28 

All  breast  cancer  cases  versus  controls 

AABC 

African 

3,016/2,745 

0.16 

1.06  (0.95-1.17) 

0.30 

BCAC  Combined  GWAS 

European 

8,785/10,142 

0.35 

1.04  (0.99-1.09) 

0.11 

MEC-JPT 

Japanese 

886/830 

0.26 

1.08  (0.91-1.24) 

0.46 

MEC-LAT 

Latino 

546/553 

0.29 

1.24  (1.03-1.48) 

0.021 

SFBCS/NC-BCFR 

Latino 

970/611 

0.29 

1.23  (1.05-1.44) 

0.011 

Stage  2  (all  cases) 

14,202/14,880 

1.06  (1.02-1.10) 

0.0025 

0.14/0.073 

Stage  1+2  (all  cases) 

17,869/43,745 

1.08  (1.05-1.12) 

1.3x1  O'6 

0.056/0.19 

“Number  of  cases  and  controls  with  genotype  data  for  rs2284378.  bRisk  Allele  Frequency  (RAF)  in  controls.  “Adjusted  for  age,  study 
and  principal  components  in  AABC.  Adjusted  for  age  and  country  in  TNBCC.  Adjusted  for  age  categories  and  top  6  eigenvectors  in 
BPC3.  Adjusted  for  age  and  top  10  eigenvectors  in  MEC-JPT,  MEC-LAT  and  SFBCS/NC-BCFR  studies.  Combined  analysis 
(Stagel,  Stage2  and  Stage  1+2)  are  from  the  meta-analysis.  dP  for  trend  (1-d.f.).  eP  for  heterogeneity  by  study  and  race/ethnicity, 
respectively. 
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Table  2.  Association  of  SNP  rs  17530068  (C/T)  at  chromosome  6ql4  and  breast  cancer  risk  by  study  and  race/ethnicity 


Consortium/ 

Study 

Race/ 

Ethnicity 

Case/ 

control2 

RAF 

(C  allele)b 

OR 

(95%  CI)C 

P-valued 

P  Het-study/ 

P  e 

1  Het-race 

Stage  1  ER-negative  cases  versus  controls 

BPC3 

European 

2,188/25,519 

0.24 

1.23  (1.12-1.35) 

2.23xl0"5 

TNBCC 

European 

1,478/3,345 

0.24 

1.13  (1.02-1.26) 

0.023 

AABC 

African 

1,004/2,745 

0.07 

1.07  (0.86-1.34) 

0.54 

Stage  1 

4,670/31,609 

1.17  (1.09-1.26) 

3.5xl0'6 

0.37/0.41 

Stage  2  ER-negative  cases  versus  controls 

BCAC  combined  GWAS 

European 

562/6,410 

0.22 

1.09  (0.95-1.25) 

0.24 

MEC-JPT 

Japanese 

84/830 

0.19 

1.16  (0.79-1.71) 

0.45 

MEC-LAT 

Latino 

112/553 

0.23 

1.06  (0.75-1.50) 

0.73 

SFBCS/NC-BCFR 

Latino 

188/611 

0.22 

1.40  (1.07-1.84) 

0.014 

Stage  2  (ER-negative) 

946/8,404 

1.14  (1.02-1.28) 

0.022 

0.41/0.52 

Stage  1+2  (ER-negative) 

5,616/40,013 

1.16(1.10-1.23) 

2.5xl07 

0.54/0.78 

All  breast  cancer  cases  versus  controls 

AABC 

African 

3,016/2,745 

0.07 

1.04  (0.89-1.21) 

0.63 

BCAC  combined  GWAS 

European 

8,785/10,142 

0.22 

1.08  (1.02-1.14) 

0.0021 

MEC-JPT 

Japanese 

886/830 

0.19 

1.13  (0.96-1.34) 

0.14 

MEC-LAT 

Latino 

546/553 

0.23 

1.21  (0.99-1.47) 

0.056 

SFBCS/NC-BCFR 

Latino 

970/611 

0.22 

1.27  (1.07-1.51) 

0.006 

Stage  2  (all  cases) 

14,203/14,881 

1.10  (1.05-1.15) 

1.8xl0'5 

0.31/0.20 

Stage  1+2  (all  cases) 

17,869/43,745 

1.12(1.08-1.16) 

l.lxlO'9 

0.16/0.30 

“Number  of  cases  and  controls  with  genotype  data  for  rsl7530068.  bRisk  Allele  Frequency  (RAF)  in  controls. 

cAdjusted  for  age,  study  and 

principal  components  in  AABC.  Adjusted  for  age  and  country  in  TNBCC.  Adjusted  for  age  categories  and  top  6  eigenvectors  in  BPC3. 
Adjusted  for  age  and  top  10  eigenvectors  in  MEC-JPT,  MEC-LAT  and  SFBCS/NC-BCFR  studies.  Combined  analysis  (Stagel,  Stage2 
and  Stage  1+2)  are  from  the  meta-analysis.  dP  for  trend  eP  for  heterogeneity  by  study  and  race/ethnicity,  respectively. 
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ABBREVIATIONS 


ER=Estrogen  Receptor 
PR=Progesterone  Receptor 
SNP=Single  nucleotide  polymorphism 
GWAS=Genome-wide  Association  Study 
OR=Odds  Ratio 

BPC3=NCI  Breast  and  Prostate  Cancer  Cohort  Consortium 
TNBCC=Triple  Negative  Breast  Cancer  Consortium 
AABC=African  American  Breast  Cancer  Consortium 
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